A Generational Evolutionary Approach on Large Databases for Quality Records

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2015 by IJCTT Journal
Volume-22 Number-3
Year of Publication : 2015
Authors : D.V.Divya Deepika, D.T.V. DharmajeeRao
  10.14445/22312803/IJCTT-V22P123

MLA

D.V.Divya Deepika, D.T.V. DharmajeeRao "A Generational Evolutionary Approach on Large Databases for Quality Records". International Journal of Computer Trends and Technology (IJCTT) V22(3):112-116, April 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Different types of digital libraries and ecommerce websites are exist with duplicate contents. Previously many systems are present for removing replica or duplicate items. Previous approaches are implemented in different repositories for detection of duplicate records. It can provides the organized or alignment based results. Those approaches are detect the results are near duplicate and range based results. These approaches are reducing the computation cost and time. Result is not contains any quality data. Increasing the digital libraries data quality new approaches are implementing in present system. This new approach we call as a genetic programming. Genetic programming contains the three major operations here. Those operations are selection, crossover and mutation. All three operations are performing in database. Execution of operations applies the de-duplication function. After removing the duplicate records apply the suggested function. These two functions are working properly and it can contain less computational resources utilization in implementation. Compare to all previous approaches present approach provides less burden, efficient and accurate results display here. It can provide good evidence based results.

References
[1] M. Wheatley, “Operation Clean Data,” CIO Asia Magazine, http://www.cio-asia.com, Aug. 2004.
[2] N. Koudas, S. Sarawagi, and D. Srivastava, “Record Linkage:Similarity Measures and Algorithms,” Proc. ACM SIGMOD Int’lConf. Management of Data, pp. 802-803, 2006.
[3] S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani, “Robust andEfficient Fuzzy Match for Online Data Cleaning,” Proc. ACMSIGMOD Int’l Conf. Management of Data, pp. 313-324, 2003.
[4] I. Bhattacharya and L. Getoor, “Iterative Record Linkage forCleaning and Integration,” Proc. Ninth ACM SIGMOD WorkshopResearch Issues in Data Mining and Knowledge Discovery, pp. 11-18,2004.
[5] I.P. Fellegi and A.B. Sunter, “A Theory for Record Linkage,” J. Am.Statistical Assoc., vol. 66, no. 1, pp. 1183- 1210, 1969.
[6] V.S. Verykios, G.V. Moustakides, and M.G. Elfeky, “A BayesianDecision Model for Cost Optimal Record Matching,” The VeryLarge Databases J., vol. 12, no. 1, pp. 28-40, 2003.
[7] R. Bell and F. Dravis, “Is You Data Dirty? and Does that Matter?,”Accenture Whiter Paper, http://www.accenture.com, 2006.
[8] J.R. Koza, Gentic Programming: On the Programming of Computers byMeans of Natural Selection. MIT Press, 1992.
[9] W. Banzhaf, P. Nordin, R.E. Keller, and F.D. Francone, GeneticProgramming - An Introduction: On the Automatic Evolution ofComputer Programs and Its Applications. Morgan Kaufmann Publishers,1998.
[10] H.M. de Almeida, M.A. Gonc¸alves, M. Cristo, and P. Calado, “ACombined Component Approach for Finding Collection-AdaptedRanking Functions Based on Genetic Programming,” Proc. 30thAnn.Int’l ACM SIGIR Conf. Research and Development in InformationRetrieval, pp. 399-406, 2007.

Keywords
These two functions are working properly and it can contain less computational resources utilization in implementation.