International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 3 | Issue 3 | Year 2012 | Article Id. IJCTT-V3I3P108 | DOI : https://doi.org/10.14445/22312803/IJCTT-V3I3P108

A Survey On Deduplication Methods


A.Faritha Banu, C. Chandrasekar

Citation :

A.Faritha Banu, C. Chandrasekar, "A Survey On Deduplication Methods," International Journal of Computer Trends and Technology (IJCTT), vol. 3, no. 3, pp. 343-347, 2012. Crossref, https://doi.org/10.14445/22312803/IJCTT-V3I3P108

Abstract

There is an increasing demand for systems that can provide secure data storage in a cost-effective manner. Having duplicate records occupies more space and even increases the access time. Thus there is a need to eliminate duplicate records. This sounds to be simple but requires an tedious work since duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are also introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors.

Keywords

data storage, managing data, deduplication.

References

[1]R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
[2]R. Bell and F. Dravis, “Is You Data Dirty? and Does that Matter?,” Accenture Whiter Paper, http://www.accenture.com, 2006.
[3]Jiansheng Wei,1Ke Zhou, 2Lei Tian, 1Hua Wang, Dan Feng,” A Fast Dual-level Fingerprinting Scheme for Data Deduplication“
[4]Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller,” Secure Data Deduplication”
[5]Michael O. Rabin, "Fingerprinting by random polynomials", Technical Report, No. TR-15-81, Center for Research in Computing Technology, Harvard University, Cambridge, MA, USA, 1981.
[6]Moise´s G. de Carvalho, Alberto H.F. Laender, Marcos Andre´ Gonc¸alves, and Altigran S. da Silva.” A Genetic Programming Approach to Record Deduplication”
[7]Peter Christen.”Probabilistic Data Generation for Deduplication and Data Linkage”, http://datamining.anu.edu.au/linkage.html.
[8]Weifeng Su, Jiying Wang, and Frederick H. Lochovsky, “Record Matching over Query Results from Multiple Web Databases”, IEEE Transactions On Knowledge And Data Engineering, VOL. 22, NO. 4, APRIL 2010