A Comparative study on clustering of data using Improved K-means Algorithms

Abhilash C B; Sharana basavanagowda

doi:https://doi.org/10.14445/22312803/IJCTT-V4I4P166

Research Article | Open Access | Download PDF

Volume 4 | Issue 4 | Year 2013 | Article Id. IJCTT-V4I4P166 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I4P166

A Comparative study on clustering of data using Improved K-means Algorithms

Abhilash C B, Sharana basavanagowda

Citation :

Abhilash C B, Sharana basavanagowda, "A Comparative study on clustering of data using Improved K-means Algorithms," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 4, pp. 771-778, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I4P166

Abstract

There exist many algorithms for clustering, and most widely used is K-means algorithm as it is easy to understand and simulate on different datasets. In our paper work we have used K-means algorithm for clustering of yeast dataset and iris datasets, in which clustering resulted in less accuracy with more number of iterations. We are simulating an improved version of K-means algorithm for clustering of these datasets, the Improved K-means algorithm use the technique of minimum spanning tree. An undirected graph is generated for all the input data points and then shortest distance is calculated which intern results in better accuracy and also with less number of iterations. Both algorithms have been simulated using java programming language; the results obtained from both algorithms are been compared and analysed. Algorithms have been run for several times under different clustering groups and the analysis results showed that the Improved K-means algorithm has provided a better performance as compared to K-means algorithm; also Improved K-means algorithm showed that, as the number of cluster values increases the accuracy of the algorithm also increases. Also we have inferred from the results that at a particular value of K (cluster groups) the accuracy of Improved K-means algorithm is optimal.

Keywords

K-Means, MST, Improved K-Means, Yeast dataset, iris dataset.

References

[1] A.K. Jain and R.C. Dubes, Algorithms for Clustering, prentice Hall, 1988.
[2] Webster, Two Crows Corporation 1999 Two Crows Corporation, “Introduction to Data Mining and Knowledge Discovery”, 1999.
[3] Kiri Wagsta and Claire Cardie, Department of Computer Science, Cornell University, Ithaca, “Constrained K-means Clustering with Background Knowledge” USA, 2001.
[4] Kantabutra 1999 S. Kantabutra, “Parallel K-means Clustering Algorithm on NOWs”, Department of Computer Science, Tufts University, 1999.
[5] Bashar Al-Shboul, and Sung-Hyon Myaeng “Initializing K-Means using Genetic Algorithms” World Academy of Science, Engineering and Technology. 2009.
[6] Min Feng College of Information Engineering. Taishan Medical University Taian 271016, China. “A Genetic K-means Clustering Algorithm Based on the Optimized Initial mail:fmxxsc@126.com. May 2011.
[7] Refining Initial Points for K-Means Clustering P. S. Bradley Microsoft Research Redmond, WA 98052, USA bradley@microsoft.com. May 1998.
[8] Eisen MB, Spellman P T, Brown PO, a1.Cluster analysis and display of genome-wide expression patterns [J]. Proc National Accad of Science, USA, 1998, 95:14863-14868.
[9] Y. Xu, V. Olman and D. Xu, Clustering gene expression data using a graph-theriotic approach: An application of minimum spanning trees, Bioinformatics, 18(2002) 536-545.
[10] Tamayo P,Slonim D,Mesirov Jet a1.Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation[J].Proc Natl USA,1999,96:2907-2912.
[11] Nikos Vlassis, JakobJ. Verbeek, The global kmeans clustering algorithm, Department of Computer Science, University of Ioannina, 45110 Ioannina, Greece, 4 March 2002.
[12] Roy Kwang Yang Chang, Chu Kiong Loo and M.V.C. Rao, A Global k-means Approach for Autonomous Cluster Initialization of Probabilistic Neural Network, May 14, 2007.