A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud

Sameesha Vs

doi:10.14445/22312803/IJCTT-V45P110

Research Article | Open Access | Download PDF

Volume 45 | Number 1 | Year 2017 | Article Id. IJCTT-V45P110 | DOI : https://doi.org/10.14445/22312803/IJCTT-V45P110

A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud

Sameesha Vs

Citation :

Sameesha Vs, "A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud," International Journal of Computer Trends and Technology (IJCTT), vol. 45, no. 1, pp. 45-49, 2017. Crossref, https://doi.org/10.14445/22312803/IJCTT-V45P110

Abstract

A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.

Keywords

the scalability and efficiency of TDS can be significantly improved over existing approaches.

References

[1] X. Zhang, L.T. Yang, C. Liu and J. Chen, “A scalable two phase top-down specialization approach for data anonymization using MapReduce on cloud,” IEEE Transactions on Parallel and Distributed Systems, In press, 2013.
[2]. K. LeFevre, D.J. DeWitt and R. Ramakrishnan, “Workload-aware anonymization techniques for large-scale datasets,” ACM Transactions on Database Systems, vol. 33, no. 3, pp. 1-47, 2008.
[3]. T. Iwuchukwu and J.F. Naughton, “K-anonymization as spatial indexing: Toward scalable and incremental anonymization,” Proc. the 33rd International Conference on Very Large Data Bases (VLDB`07), pp. 746- 757, 2007.
[4]. J. Dean and S. Ghemawat, “Mapreduce: A flexible data processing tool,” Communications of the ACM, vol. 53, no. 1, pp. 72-77, 2010.
[5]. K.-H. Lee, Y.-J. Lee, H. Choi, Y.D. Chung and B. Moon, “Parallel data processing with mapreduce: A survey,” ACM SIGMOD Record, vol. 40, no. 4, pp. 11-20, 2012.
[6]. Palit and C.K. Reddy, “Scalable and parallel boosting with mapreduce,” IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 10, pp. 1904-1916, 2012.
[7]. Amazon Web Services, “Amazon elastic mapreduce(amazon emr),” http://aws.amazon.com/ elasticmapreduce/, accessed on 10 June, 2013.
[8]. L. Sweeney, “k-anonymity: a model for protecting privacy”, International Journal on Uncertainty, Fuzziness and Knowledge based Systems, 2002, pp. 557-570.
[9]. B.C.M. Fung, K. Wang, R. Chen and P.S. Yu, “Privacy- Preserving Data Publishing: A Survey of Recent Developments,” ACM Comput. Surv., vol. 42, no. 4, pp. 1-53, 2010.
[10]. Geherke, J. 2006. Models and methods for privacy-preserving data publishing and analysis. Tutorial at the 12th ACM SIGKDD.
[11]. Chaum, D. 1981. Untraceable electronic mail, return addresses, and digital pseudonyms. Comm. ACM 24, 2, 84–88.
[12] T. Bozkaya and Z.M. O ¨ zsoyoglu, “Indexing Large Metric Spaces for Similarity Search Queries,”