Hierarchical Filter based Document Clustering Algorithm

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-21 Number-1
Year of Publication : 2015
Authors : Mulluri Raghupathi, R. Lakshmi Tulasi


Mulluri Raghupathi, R. Lakshmi Tulasi "Hierarchical Filter based Document Clustering Algorithm". International Journal of Computer Trends and Technology (IJCTT) V21(1):34-40, March 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Clustering is the one of the major important task in data mining .The task of clustering is to find the fundamental structures in data and categorize them into meaningful subgroups for supplementary study and examination. Existing K-Means clustering with MVS measure it doesn't best position to cluster the data points. This problem will lead to gain less optimal solution for clustering method. Using multiple viewpoints, more informative assessment of similarity could be achieved. Theoretical analysis and empirical study are conducted to support this claim. Two criterion functions for document clustering are proposed based on this new measure. We compare them with several well-known clustering algorithms that use other popular similarity measures on various document collections to verify the advantages of our proposal. In this proposed approach, multiview clustering is applied on different applications namely on text documents and real-time document clustering on local disks. Proposed approach gives better clustering accuracy in terms of different sizes of data.

[1] X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. Ng, B. Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J. Hand, and D. Steinberg, “Top 10 Algorithms in Data Mining,” Knowledge Information Systems, vol. 14, no. 1, pp. 1-37, 2007.
[2] I. Guyon, U.V. Luxburg, and R.C. Williamson, “Clustering: Science or Art?,” Proc. NIPS Workshop Clustering Theory, 2009.
[3] I. Dhillon and D. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, nos. 1/2, pp. 143-175, Jan. 2001
[4] D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. In COLT, pages 458– 469, 2005.
[5] Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996, pp. 226-231.
[6] Y. Zhao and G. Karypis, “Criterion Functions for Document Clustering: Experiments and Analysis,”technical report, Dept. of Computer Science, Univ. of Minnesota, 2002.
[7] H. Chim and X. Deng, “Efficient Phrase-Based Document Similarity for Clustering,” IEEE Trans.Knowledge and Data Eng.,vol. 20, no. 9, pp. 1217- 1229, Sept. 2008.
[8] M. Pelillo, “What Is a Cluster? Perspectives from Game Theory,” Proc. NIPS Workshop Clustering Theory, 2009.
[9] A. Banerjee, I. Dhillon, J. Ghosh, S. Sra,―Clustering on the unit hypersphere using von Mises-Fisher distributions‖ , J. Mach. Learn. Res., Vol. 6, pp. 1345–1382, Sep 2005.
[10] W. Xu, X. Liu, Y. Gong,―Document clustering based on nonnegative matrix factorization‖ , in SIGIR, 2003, pp. 267–273.

K-Means clustering with MVS measure it doesn't best position to cluster the data points.