Duplicate Detection with Map Reduce and Deletion Procedure

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2017 by IJCTT Journal
Volume-48 Number-2
Year of Publication : 2017
Authors : Ms.Tanvee Meshram, Prof.Nivedita Kadam
DOI :  10.14445/22312803/IJCTT-V48P112

MLA

Ms.Tanvee Meshram, Prof.Nivedita Kadam "Duplicate Detection with Map Reduce and Deletion Procedure". International Journal of Computer Trends and Technology (IJCTT) V48(2):51-53, June 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
In the real world entities have two or more repetition in database. Duplicate detection is method of detecting all cases of multiple illustration of some real world objects, example customer relationship management or data mining. A representative example customer relationship management, where a company loses money by sending multiple catalogs to the same person that would be lowering customer satisfaction. Another application is Data Mining i.e to correct input data is necessary to construct useful reports that from the basis of mechanisms. In this paper to study about the progressive duplication algorithm with the help of map reduce to detect the duplicates data and delete those duplicate records.

References
[1] Thorsten Papenbrock, ArvidHeise, and Felix Naumann,? Progressive Duplicate Detection? IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2014.
[2] S. Yan, D. Lee, M. yen Kan, and C. L. Giles, “Adaptive sorted neighborhood methods for efficient record linkage,” in International Conference on Digital Libraries (ICDL), 2007.
[3] M. A. Hernández and S. J. Stolfo, “Real-world data is dirty: Data cleansing and the merge/purge problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, 1998.
[4] X.Dong, A.Halevy, and J.Madhavan, “Reference reconciliation in complexinformation spaces,” in Proceedings of the International Conference on Management of Data (SIGMOD), 2005.
[5] S.E.Whang, D.Marmaros, and H.Garcia-Molina, “Pay-as-you-go entity resolution” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2012.
[6] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicat record detection: A survey,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, 2007.
[7] U.Draisbach, F.Naumann, S.Szott, and O. Wonneberg, “Adaptive windows for duplicate detection,” in Proceedings of the International Conference on Data Engineering (ICDE), 2012.
[8] U.Draisbach and F. Naumann, “A generalization of blocking and windowing algorithms for duplicate detection.” in International Conference on Data and Knowledge Engineering (ICDKE), 2011.
[9] L. Kolb, A. Thor, and E. Rahm, “Parallel sorted neighbourhoodblockingwithmapreduce,” in Proceedings of the Conference Datenbank system in Büro, Technik und Wissenschaft(BTW

Keywords
Duplicate Detection,Data Cleaning, PSNM,Map Reduce.