Duplicate Detection with Map Reduce and Deletion Procedure

Ms.Tanvee Meshram; Prof.Nivedita Kadam

doi:https://doi.org/10.14445/22312803/IJCTT-V48P112

Research Article | Open Access | Download PDF

Volume 48 | Number 1 | Year 2017 | Article Id. IJCTT-V48P112 | DOI : https://doi.org/10.14445/22312803/IJCTT-V48P112

Duplicate Detection with Map Reduce and Deletion Procedure

Ms.Tanvee Meshram, Prof.Nivedita Kadam

Citation :

Ms.Tanvee Meshram, Prof.Nivedita Kadam, "Duplicate Detection with Map Reduce and Deletion Procedure," International Journal of Computer Trends and Technology (IJCTT), vol. 48, no. 1, pp. 51-53, 2017. Crossref, https://doi.org/10.14445/22312803/IJCTT-V48P112

Abstract

In the real world entities have two or more repetition in database. Duplicate detection is method of detecting all cases of multiple illustration of some real world objects, example customer relationship management or data mining. A representative example customer relationship management, where a company loses money by sending multiple catalogs to the same person that would be lowering customer satisfaction. Another application is Data Mining i.e to correct input data is necessary to construct useful reports that from the basis of mechanisms. In this paper to study about the progressive duplication algorithm with the help of map reduce to detect the duplicates data and delete those duplicate records.

Keywords

Duplicate Detection,Data Cleaning, PSNM,Map Reduce.

References

[1] Thorsten Papenbrock, ArvidHeise, and Felix Naumann,? Progressive Duplicate Detection? IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2014.
[2] S. Yan, D. Lee, M. yen Kan, and C. L. Giles, “Adaptive sorted neighborhood methods for efficient record linkage,” in International Conference on Digital Libraries (ICDL), 2007.
[3] M. A. Hernández and S. J. Stolfo, “Real-world data is dirty: Data cleansing and the merge/purge problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, 1998.
[4] X.Dong, A.Halevy, and J.Madhavan, “Reference reconciliation in complexinformation spaces,” in Proceedings of the International Conference on Management of Data (SIGMOD), 2005.
[5] S.E.Whang, D.Marmaros, and H.Garcia-Molina, “Pay-as-you-go entity resolution” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 25, no. 5, 2012.
[6] A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicat record detection: A survey,” IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 19, no. 1, 2007.
[7] U.Draisbach, F.Naumann, S.Szott, and O. Wonneberg, “Adaptive windows for duplicate detection,” in Proceedings of the International Conference on Data Engineering (ICDE), 2012.
[8] U.Draisbach and F. Naumann, “A generalization of blocking and windowing algorithms for duplicate detection.” in International Conference on Data and Knowledge Engineering (ICDKE), 2011.
[9] L. Kolb, A. Thor, and E. Rahm, “Parallel sorted neighbourhoodblockingwithmapreduce,” in Proceedings of the Conference Datenbank system in Büro, Technik und Wissenschaft(BTW