A Survey On Deduplication Methods

  IJCOT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© - Issue 2012 by IJCTT Journal
Volume-3 Issue-3                           
Year of Publication : 2012
Authors :A.Faritha Banu, C. Chandrasekar

MLA

A.Faritha Banu, C. Chandrasekar "A Survey On Deduplication Methods"International Journal of Computer Trends and Technology (IJCTT),V3(3):343-347 Issue 2012 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: -There is an increasing demand for systems that can provide secure data storage in a cost-effective manner. Having duplicate records occupies more space and even increases the access time. Thus there is a need to eliminate duplicate records. This sounds to be simple but requires an tedious work since duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are also introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors.

References-

[1]R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
[2]R. Bell and F. Dravis, “Is You Data Dirty? and Does that Matter?,” Accenture Whiter Paper, http://www.accenture.com, 2006.
[3]Jiansheng Wei,1Ke Zhou, 2Lei Tian, 1Hua Wang, Dan Feng,” A Fast Dual-level Fingerprinting Scheme for Data Deduplication“
[4]Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller,” Secure Data Deduplication”
[5]Michael O. Rabin, "Fingerprinting by random polynomials", Technical Report, No. TR-15-81, Center for Research in Computing Technology, Harvard University, Cambridge, MA, USA, 1981.
[6]Moise´s G. de Carvalho, Alberto H.F. Laender, Marcos Andre´ Gonc¸alves, and Altigran S. da Silva.” A Genetic Programming Approach to Record Deduplication”
[7]Peter Christen.”Probabilistic Data Generation for Deduplication and Data Linkage”, http://datamining.anu.edu.au/linkage.html.
[8]Weifeng Su, Jiying Wang, and Frederick H. Lochovsky, “Record Matching over Query Results from Multiple Web Databases”, IEEE Transactions On Knowledge And Data Engineering, VOL. 22, NO. 4, APRIL 2010

Keywordsdata storage, managing data, deduplication