Abstract
Companies’ websites are vulnerable to privacy attacks that can compromise the confidentiality of data which, particularly in sensitive use cases like personal data, financial transaction details, medical diagnosis, could be detrimental and unethical. The noncompliance of companies with privacy policies requirements as stipulated by the various Data Protection Regulations has raised lot of concerns for users and other practitioners. To address this issue, previous research developed a model using conventional algorithms such as Neural Network (NN), Logistic Regression (LR) and Support Vector Machine (SVM) to evaluate the levels of compliance of companies to general data protection regulations. However, the research performance shows to be unsatisfactory as the model’s performance across the selected core requirements of the legislation attained F1-score of between 0.52–0.71. This paper improved this model’s performance by using Natural Language Processing (NLP) and Deep Learning (DL) techniques. This was done by evaluating the same dataset used by the previous researcher to train the proposed model. The overall results show that LSTM outperform both GRU and CNN models in terms of F1-score and accuracy. This research paper is to assist the Supervisory Authority and other practitioners to better determine the state of companies’ privacy policies compliance with the relevant data protection regulations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andow, B., et al.: Policylint: investigating internal privacy policy contradictions on google play. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 585–602 (2019)
Baia, A.E., Biondi, G., Franzoni, V., Milani, A., Poggioni, V.: Lie to me: shield your emotions from prying software. Sensors 22(3), 967 (2022)
Bowyer, K.W., Chawla, N.V., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. CoRR abs/1106.1813 (2011). http://arxiv.org/abs/1106.1813
Costante, E., Sun, Y., Petković, M., Den Hartog, J.: A Machine Learning Solution to Assess Privacy Policy Completeness: (short paper). In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, pp. 91–96, October 2012
Chika, D.M., Tochukwu, E.S.: An Analysis of Data Protection and Compliance in Nigeria (2020). https://www.rsisinternational.org/journals/ijriss/DigitalLibrary/volume-4-issue-5/377-382.pdf
Degeling, M., Utz, C., Lentzsch, C., Hosseini, H., Schaub, F., Holz, T.: We value your privacy... now take some cookies: Measuring the GDPR's impact on web privacy. arXiv preprint arXiv:1808.05096 (2018)
Franzoni, V., Kozak, Y.: Yeasts automated classification with extremely randomized forests. In International Conference on Computational Science and Its Applications, pp. 436–447. Springer, Cham, September 2021
Goltz, N., Mayo, M.: Enhancing regulatory compliance by using artificial intelligence text mining to identify penalty clauses in legislation. RAIL 1, 175 (2018)
Harkous, H., Fawaz, K., Lebret, R., Schaub, F., Shin, K. G., Aberer, K.: Polisis: Automated analysis and presentation of privacy policies using deep learning. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 531–548 (2018)
Kinne, J., Axenbeck, J.: Web Mining of Firm Websites: A Framework for Web Scraping and a Pilot Study for Germany. In: ZEW-Centre for European Economic Research Discussion Paper, (18–033) (2018)
Micheti, A., Burkell, J., Steeves, V.: Fixing broken doors: strategies for drafting privacy policies young people can understand. Bull. Sci. Technol. Soc. 30(2), 130–143 (2010)
Muller, N. M., Kowatsch, D., Debus, P., Mirdita, D., Böttinger, K. (2019, September). On GDPR Compliance of Companies’ Privacy Policies. In: International Conference on Text, Speech, and Dialogue, pp. 151–159. Springer, Cham (2019)
Labadie, C., Legner, C.: Understanding data protection regulations from a data management perspective: a capability-based approach to EU-GDPR. In: Proceedings of the 14th International Conference on Wirtschaftsinformatik, February 2019
Liu, F., Fella, N. L., Liao, K.: Modeling Language Vagueness in Privacy Policies Using Deep Neural Networks. In: 2016 AAAI Fall Symposium Series, September 2016
O’Connor, P.: Privacy and the online travel customer: an analysis of privacy policy content, use and compliance by online travel agencies. In: ENTER, pp. 401–412, January 2004
Ramaiah, M., Chandrasekaran, V., Ravi, V., Kumar, N.: An intrusion detection system using optimized deep neural network architecture. Trans. Emerging Telecommun. Technol. 32(4), e4221 (2021)
Sadeh, N., et al.: The usable privacy policy project: Combining crowdsourcing. Machine Learning and Natural Language Processing to Semi-Automatically Answer Those Privacy Questions Users Care About. Carnegie Mellon University Technical Report CMU-ISR-13–119, 1–24 (2013)
Sathyendra, K.M., Wilson, S., Schaub, F., Zimmeck, S., Sadeh, N.: Identifying the provision of choices in privacy policy text. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2774–2779, September 2017
Sánchez, D., Viejo, A., Batet, M.: Automatic assessment of privacy policies under the GDPR. Appl. Sci. 11(4), 1762 (2021)
Tesfay, W.B., Hofmann, P., Nakamura, T., Kiyomoto, S., Serna, J.: PrivacyGuide: Towards an Implementation of the EU GDPR on Internet Privacy Policy Evaluation. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, pp. 15–21, March 2018
Zaeem, R.N., German, R.L., Barber, K.S.: Privacycheck: automatic summarization of privacy policies using data mining. ACM Trans. Internet Technol. (TOIT) 18(4), 1–18 (2018)
Zimmeck, S., Bellovin, S.M.: Privee: an architecture for automatically analyzing web privacy policies. In 23rd Security Symposium (USENIX Security 14), pp. 1–16 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
John, S., Ajayi, B.A., Marafa, S.M. (2022). Natural Language Processing and Deep Learning Based Techniques for Evaluation of Companies’ Privacy Policies. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13377. Springer, Cham. https://doi.org/10.1007/978-3-031-10536-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-10536-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10535-7
Online ISBN: 978-3-031-10536-4
eBook Packages: Computer ScienceComputer Science (R0)