Automatic Database Clustering: Issues and Algorithms

Sakshi Kumar; Mahesh Singh; Sunil Sharma

doi:https://doi.org/10.14445/22312803/ IJCTT-V10P136

Research Article | Open Access | Download PDF

Volume 10 | Number 2 | Year 2014 | Article Id. IJCTT-V10P136 | DOI : https://doi.org/10.14445/22312803/IJCTT-V10P136

Automatic Database Clustering: Issues and Algorithms

Sakshi Kumar , Mahesh Singh , Sunil Sharma

Citation :

Sakshi Kumar , Mahesh Singh , Sunil Sharma, "Automatic Database Clustering: Issues and Algorithms," International Journal of Computer Trends and Technology (IJCTT), vol. 10, no. 2, pp. 208-213, 2014. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V10P136

Abstract

Clustering is the process of grouping of data, where the grouping is established by finding similarities between data based on their characteristics. Such groups are termed as Clusters. Clustering is an unsupervised learning problem that group objects based upon distance or similarity. While a lot of work has been published on clustering of data on storage medium, little has been done about automating this process. There should be an automatic and dynamic database clustering technique that will dynamically re-cluster a database with little intervention of a database administrator (DBA) and maintain an acceptable query response time at all times. A good physical clustering of data on disk is essential to reducing the number of disk I/Os in response to a query whether clustering is implemented by itself or coupled with indexing, parallelism, or buffering. In this paper we describe the issues faced when designing an automatic and dynamic database clustering technique for relational databases.. A comparative study of clustering algorithms across two different data items is performed here. The performance of the various clustering algorithms is compared based on the time taken to form the estimated clusters. The experimental results of various clustering algorithms to form clusters are depicted as a graph.

Keywords

Cluster, Cluster Analyzer, Database Clustering, Nodes,

References

[1] (Agrawal, 2004) Sanjay Agrawal, Vivek Narasayya, and Beverly Yang, "Inreg~nting Vevficol nnd Horizontnl Pnrtilioning into Automafed Physicnl Dntnbose Design," the 2004 ACM SIGMOD lnternational Conference on Management of Data. June 2004.
[2] (AMS, 2003) Automatic Computing Workshop, 5"` Annual lnternational Wwkshop on Active Middleware Services., June 2003.
[3] (Aouiche, 2003) Kamel Aouiche, Jerome Darmont, and Le Gruenwald, "Frequent ltemsets Mininig for Database Auto-Administration", the lnternational Database Engineering and Applications Symposium, 2003, 16-18 July 2003, pages 98-1 03.
[4] (Bernstein, 1998) Phil Bernstein, Michael Brodie, Stefano Ceri, David DeWitt, Mike Frankiln, Hector Garcia-Molina, Jim Gray, Jeny Held, Joe Hellerstein, H. V. Jagadish, Michael Lesk, Dave Maier, Jeff Naughton, Hamid Pirahesh, Mike Stonebraker, and Jeff Ullman, "The Asilomar Report on Database Research", ACM SIGMOD Record,Vol. 27 , Issue 4, pp. 74-80, December 1998.
[5] (Brinkhoff, 2001) Thomas Brinkhoff, "Using a Cluster Manager in a Spatial Database System", proceedings of the ninth ACM international symposium on Advances in geographic information systems, 2001, pp. 136.141.
[6] C. S. Li, “Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters”, “2011 International Conference on Advances in Engineering, Elsevier”, pp. 324-328, vol.24, 2011.
[7] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko and A. Wu, “An Efficient K-Means Clustering Algorithm: Analysis and Implementation”, “IEEE Transactions on Pattern analysis and Machine intelligence”, vol. 24, no.7, 2002.
[8] Y.M. Cheung, “A New Generalized K-Means Clustering Algorithm”,“Pattern Recognition Letters, Elsevier”,vol.24,issue15, 2883–2893, Nov.2003.
[9] Z. Li, J. Yuan, H. Yang and Ke Zhang, “K-Mean Algorithm with a Distance Based on the Characteristic of Differences”, ”IEEE International conference on Wireless communications, Networking and mobile computing”, pp. 1-4, Oct.2008.
[10] A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu,
[11] Advances in knowledge discovery and data mining: 9th Pacific-Asia conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005; proceedings Tu Bao Ho, David Cheung, Huan Liu
[12] Anomaly Detection in Temperature Data Using DBSCAN Algorithm: Erciyes Univeristy, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5946052&tag=1 Mete Celic, Filiz Dadaser-Celic, Ahmet Sakir DOKUZ