Preserving Data Mining Over Horizontally Data

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-23 Number-4
Year of Publication : 2015
Authors : Kamlesh Yadav, Dr.Pushpender Sarao


Kamlesh Yadav, Dr.Pushpender Sarao "Preserving Data Mining Over Horizontally Data". International Journal of Computer Trends and Technology (IJCTT) V23(4):192-197, May 2015. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
Data mining has been a popular research area among the researchers for more than a decade because of its vast use of applications. This vast collection of data need to be mined for the purpose of knowledge discovery as Data mining is the field of extracting interesting patterns from large data collections. Data mining enables organizations to get agreed on grouping their data together for mining purpose because they know that mining results are fruitful for them. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals as large data collections consists sensitive information about the individual. Organizations want to apply data mining on their data without leaking any sensitive information about their individuals to other organizations. Thus the aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases. These techniques disclose nothing but the final results to all the sites. Privacy Preserving techniques are applied in many different areas like medical, bioinformatics, shopping, credit card analysis etc. And it has been a fruitful technique in all the fields. Privacy preserving techniques have been proposed for many data models like classification on centralized data then for association rules in distributed environments and clustering in vertical data partitioning. In this dissertation, we propose methods for privacy preservation in distributed environment. We construct the privacy preserving dissimilarity matrix of objects stored in different sites which can be used for privacy preserving clustering and other operations. It deals with the pair wise comparison of individual private sensitive data objects which are distributed horizontally to multiple sites. Here all the sites taking part in mining process are supposed to be the semihonest means these sites are in honest but curious state. In this dissertation we deal with the alphanumeric, categorical with numeric attributes as well. Dissimilarity matrix is being constructed with the help of a third party that is being involved to perform mining on over all collected data. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a basic protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the basic protocol and our protocol.

1. R. Agrawal, R. Srikant, 2000. Privacy Preserving Data Mining, Proc. of the 2000 ACM SIGMOD Conference on Management of Data 439-450.
2. M. Kantarcioglu, C. Clifton, 2004. Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, IEEE TKDE, 16(9).
3. T. Mitchell, 1997. Machine Learning. McGraw Hill.
4. D. Gusfield, 1997. Algorithms on Strings trees and Strings. Cambridge University Press.
5. S. Benninga and B. Czaczkes, 1997. Financial Modelling. MIT Press.
6. R. Mattison, 1997. Data Warehousing and Data Mining for Telecommunication. Artech Press.
7. Januray 1998, Office of the Information and Privacy Commissioner. Data mining: Staking a claim into your privacy. Ontario, Canada.
8. M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios, November 1999.Disclosure limitation of sensitive rules. In Proceedings of 1999 IEEE Knowledge and Data Engineering Exchange Workshop (KDEX‟99), pages 45–52, Chicago, Illinois, USA,IEEE Computer
9. Rakesh Agrawal and RamakrishnanSrikant May 14-19 2000, Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pages 439{450, Dallas, TX, ACM.
10. C.-C. Yao, , 1986.How to generate and exchange secrets. In Proc. 27th IEEE Symp. On Foundations of Computer Sciences, pages 162 167.
11. OdedGoldreich, may 2004. The Foundations of Cryptography, volume 2, chapter 7: General Cryptographic Protocols. Cambridge University Press.
12. Michael Ben-Or, Sha_ Goldwasser, and AviWigderson, May2- 4 1988. Completeness theorems for non-cryptographic faulttolerant distributed computation. In Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pages 1{10,Chicago, IL.
13. O. Veksler, , 2000 Image segmentation by nested cuts. In Proc. Of IEEE Computer Vision and Pattern Recognition, pages 339-344.

Each experiment is also performed for a basic protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the basic protocol and our protocol.