Ensuring Data Confidentiality through an In-Depth Analysis of Advanced Privacy-Preserving Methodologies in Data Science

Vikas Kumar Jain; Jayesh Patil; Ashish Kumar Jain

doi:https://doi.org/10.14445/22312803/IJCTT-V73I2P109

Research Article | Open Access | Download PDF

Volume 73 | Issue 2 | Year 2025 | Article Id. IJCTT-V73I2P109 | DOI : https://doi.org/10.14445/22312803/IJCTT-V73I2P109

Ensuring Data Confidentiality through an In-Depth Analysis of Advanced Privacy-Preserving Methodologies in Data Science

Vikas Kumar Jain, Jayesh Patil, Ashish Kumar Jain

Received	Revised	Accepted	Published
30 Dec 2024	25 Jan 2025	16 Feb 2025	28 Feb 2025

Citation :

Vikas Kumar Jain, Jayesh Patil, Ashish Kumar Jain, "Ensuring Data Confidentiality through an In-Depth Analysis of Advanced Privacy-Preserving Methodologies in Data Science," International Journal of Computer Trends and Technology (IJCTT), vol. 73, no. 2, pp. 71-79, 2025. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V73I2P109

Abstract

Privacy-preserving methods are of the greatest significance in the field of information technology as they guarantee ethical and secure data usage. The current review presents an in-depth review of state-of-the-art privacy-preserving methods such as differential privacy, Secure Multi-Party Computation (SMPC), homomorphic encryption, federated learning, and anonymization techniques. The paper presents the theoretical backgrounds, practical applications, limitations, and future advancements of the above methods, focusing on recent developments. After briefly introducing data science's privacy risk, the review presents anonymization techniques such as generalization, suppression, k-anonymity, l-diversity, and t-closeness. Homomorphic encryption, SMPC, differential privacy, and federated learning concepts and applications are also presented, citing their efficacy in protecting sensitive data while enabling data analysis collaboration. In order to emphasize the significance of privacy-preserving strategies in real applications, the study surveys real implementations in sectors such as healthcare, finance, telecommunication, social media, and government. Besides unveiling main issues such as scalability, usability, and adversarial attack resistance, the current study also presents potential future research directions for further development in this area. The current work endeavors to contribute to scholars, policymakers, and practitioners with a profound understanding of advancing ethical and sustainable data-driven decision-making by presenting an in-depth review of privacy preserving methods and their ethical implications.

Keywords

Data confidentiality, Data science, Privacy algorithms, Data security, Decentralized systems.

References

[1] Reza Shokri et al., “Membership Inference Attacks Against Machine Learning Models,” 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, pp. 3-18, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Latanya Sweeney, “k-Anonymity: A Model for Protecting Privacy,” International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Ashwin Machanavajjhala et al., “L-Diversity: Privacy Beyond k-Anonymity,” ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, pp. 1-52, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian, “t-Closeness: Privacy Beyond k-Anonymity and l-Diversity,” 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, pp. 106-115, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Arvind Narayanan, and Vitaly Shmatikov, “Robust De-Anonymization of Large Sparse Datasets,” 2008 IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 111-125, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Cynthia Dwork, “Differential Privacy: A Survey of Results,” Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, Xi'an, China, pp. 1-19, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[7] K. Balle et al., “Improving Differential Privacy in Machine Learning,” Journal of Privacy and Confidentiality, 2020.
[8] Craig Gentry, “Fully Homomorphic Encryption Using Ideal Lattices,” STOC '09: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, MD, Bethesda, USA, pp. 169-178, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[9] K. Lauter et al., “Computing on Encrypted Data,” IEEE Transactions on Information Theory, 2011. Not Found
[10] Adi Shamir, “How to Share a Secret,” Communications of the ACM, vol. 22, no. 11, pp. 612-613, 1979.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Andrew C. Yao, “Protocols for Secure Computations,” 23rd Annual Symposium on Foundations of Computer Science, Chicago, IL, USA, pp. 160-164, 1982.
[CrossRef] [Google Scholar] [Publisher Link]
[12] Brendan McMahan et al., “Communication-Efficient Learning of Deep Networks from Decentralized Data,” Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, Florida, USA, pp. 1-10, 2017.
[Google Scholar] [Publisher Link]
[13] E. Bagdasaryan et al., “Backdoor Attacks on Federated Learning,” Advances in Neural Information Processing Systems, 2020.
[14] Nicolas Papernot et al., “Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data,” arXiv, pp. 1-16, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Keith Bonawitz et al., “Practical Secure Aggregation for Privacy-Preserving Machine Learning,” CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas Texas USA, pp. 1175-1191, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[16] P. Samarati, “Protecting Respondents' Identities in Microdata Release,” IEEE Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010-1027, 2001.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Dan Boneh, and Brent Waters, “Conjunctive, Subset, and Range Queries on Encrypted Data,” Proceedings of the 4th Theory of Cryptography Conference, Amsterdam, The Netherlands, pp. 535-554, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Y. Shokri et al., “Privacy-Preserving Deep Learning via Noisy Aggregation,” International Conference on Learning Representations, 2017.
[19] Daniel Kifer, and Ashwin Machanavajjhala, “No Free Lunch in Data Privacy,” SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens Greece, pp. 193-204, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Latanya Sweeney, “Simple Demographics Often Identify People Uniquely,” Carnegie Mellon University, Report, pp. 1-34, 2000.
[Google Scholar] [Publisher Link]
[21] Benjamin C. M. Fung et al., “Privacy-preserving Data Publishing: A Survey of Recent Developments,” ACM Computing Surveys, vol. 42, no. 4, pp. 1-53, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[22] A. El Emam, “A Globally Optimal k-Anonymity Method for the De-Identification of Health Data,” Journal of the American Medical Informatics Association, vol. 16, no. 5, pp. 670-682, 2009.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Yves-Alexandre de Montjoye et al., “Unique in the Crowd: The Privacy Bounds of Human Mobility,” Scientific Reports, pp. 1-5, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[24] Frank McSherry, and Kunal Talwar, “Mechanism Design via Differential Privacy,” 48th Annual IEEE Symposium on Foundations of Computer Science, Providence, RI, USA, pp. 94-103, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Masahiro Yagisawa, “Fully Homomorphic Encryption without Bootstrapping,” Cryptology ePrint Archive, Report, pp. 1-40, 2013.
[Google Scholar] [Publisher Link]
[26] Josep Domingo-Ferrer, and Vicenç Torra, “A Critique of k-Anonymity and Some of Its Enhancements,” 2008 Third International Conference on Availability, Reliability and Security, Barcelona, Spain, pp. 990-993, 2008.
[CrossRef] [Google Scholar] [Publisher Link]
[27] Michael Hay et al., “Resisting Structural Re-Identification in Anonymized Social Networks,” The VLDB Journal, vol. 19, pp. 797-823, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Battista Biggio, and Fabio Roli, “Wild Patterns: Ten Years after the Rise of Adversarial Machine Learning,” Pattern Recognition Journal, vol. 84, pp. 317-331, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Yehuda Lindell, and Benny Pinkas, “Privacy-Preserving Data Mining,” Advances in Cryptology - CRYPTO 2000: Proceedings of the Institution of Mechanical Engineers 20th Annual International Cryptology Conference, Santa Barbara, California, USA, pp. 439-450, 2000.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Jonathan Katz, and Yehuda Lindell, Introduction to Modern Cryptography Principles and Protocols, 1st ed., Chapman & Hall/CRC, pp.1 552, 2007.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Shafi Goldwasser, and Yehuda Lindell, “Secure Multi-Party Computation Without Agreement,” Journal of Cryptology, vol. 18, pp. 247 287, 1997.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Kallista Bonawitz et al., “Federated Learning and Privacy: Building Privacy-preserving Systems for Machine Learning and Data Science on Decentralized Data,” Queue, vol. 19, no. 5, pp. 87-114, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Jakub Konečný et al., “Federated Optimization: Distributed Machine Learning for On-Device Intelligence,” arXiv, pp. 1-38, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Craig Gentry, and Shai Halevi, “Implementing Gentry’s Fully Homomorphic Encryption Scheme,” Advances in Cryptology EUROCRYPT 2011: Proceedings of the 30th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia, pp. 129-148, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[35] Payman Mohassel, and Yupeng Zhang, “SecureML: A System for Scalable Privacy-Preserving Machine Learning,” 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, pp. 19-38, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[36] Eugene Bagdasaryan et al., “How to Backdoor Federated Learning,” Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, vol. 138, pp. 2938-2948, 2020.
[Google Scholar] [Publisher Link]