A Comparative study on clustering of data using Improved K-means Algorithms

 International Journal of Computer Trends and Technology (IJCTT) © - April Issue 2013 by IJCTT Journal Volume-4 Issue-4 Year of Publication : 2013 Authors : Abhilash C B, Sharana basavanagowda

Abhilash C B, Sharana basavanagowda"A Comparative study on clustering of data using Improved K-means Algorithms"International Journal of Computer Trends and Technology (IJCTT),V4(4):771-778 April Issue 2013 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: -There exist many algorithms for clustering, and most widely used is K-means algorithm as it is easy to understand and simulate on different datasets. In our paper work we have used K-means algorithm for clustering of yeast dataset and iris datasets, in which clustering resulted in less accuracy with more number of iterations. We are simulating an improved version of K-means algorithm for clustering of these datasets, the Improved K-means algorithm use the technique of minimum spanning tree. An undirected graph is generated for all the input data points and then shortest distance is calculated which intern results in better accuracy and also with less number of iterations. Both algorithms have been simulated using java programming language; the results obtained from both algorithms are been compared and analysed. Algorithms have been run for several times under different clustering groups and the analysis results showed that the Improved K-means algorithm has provided a better performance as compared to K-means algorithm; also Improved K-means algorithm showed that, as the number of cluster values increases the accuracy of the algorithm also increases. Also we have inferred from the results that at a particular value of K (cluster groups) the accuracy of Improved K-means algorithm is optimal.

References-

[1] A.K. Jain and R.C. Dubes, Algorithms for Clustering, prentice Hall, 1988.
[2] Webster, Two Crows Corporation 1999 Two Crows Corporation, “Introduction to Data Mining and Knowledge Discovery”, 1999.
[3] Kiri Wagsta and Claire Cardie, Department of Computer Science, Cornell University, Ithaca, “Constrained K-means Clustering with Background Knowledge” USA, 2001.
[4] Kantabutra 1999 S. Kantabutra, “Parallel K-means Clustering Algorithm on NOWs”, Department of Computer Science, Tufts University, 1999.
[5] Bashar Al-Shboul, and Sung-Hyon Myaeng “Initializing K-Means using Genetic Algorithms” World Academy of Science, Engineering and Technology. 2009.
[6] Min Feng College of Information Engineering. Taishan Medical University Taian 271016, China. “A Genetic K-means Clustering Algorithm Based on the Optimized Initial Centers” E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it. . May 2011.
[7] Refining Initial Points for K-Means Clustering P. S. Bradley Microsoft Research Redmond, WA 98052, USA This email address is being protected from spambots. You need JavaScript enabled to view it. . May 1998.
[8] Eisen MB, Spellman P T, Brown PO, a1.Cluster analysis and display of genome-wide expression patterns [J]. Proc National Accad of Science, USA, 1998, 95:14863-14868. .

Keywords — K-Means, MST, Improved K-Means, Yeast dataset, iris dataset.