An NMF and Hierarchical Based Clustering Approach to support Multiviewpoint-Based

K.S.Jeen Marseline; A.Premalatha

doi:https://doi.org/10.14445/22312803/IJCTT-V4I3P118

Research Article | Open Access | Download PDF

Volume 4 | Issue 3 | Year 2013 | Article Id. IJCTT-V4I3P118 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I3P118

An NMF and Hierarchical Based Clustering Approach to support Multiviewpoint-Based

K.S.Jeen Marseline, A.Premalatha

Citation :

K.S.Jeen Marseline, A.Premalatha, "An NMF and Hierarchical Based Clustering Approach to support Multiviewpoint-Based," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 3, pp. 285-291, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I3P118

Abstract

In data mining, clustering technique is an interesting and important technique. The main goal of the clustering is finding the similarity between the data points or similarity between the data within intrinsic data structure and grouping them the data into single groups (or) subgroups in clustering process. The existing Systems is mainly used for finding the next frequent item set using greedy method, greedy algorithm can reduce the overlapping between the documents in the itemset. The documents will contain both the item set and some remaining item sets. The result of the clustering process is based on the order for choosing the item sets in the greedy approach; it doesn`t follow a sequential order when selecting clusters. This problem will lead to gain less optimal solution for clustering method. To resolve this problem, proposed system which is developing a novel hierarchal algorithm for document clustering which produces superlative efficiency and performance which is mainly focusing on making use of cluster overlapping phenomenon to design cluster merging criteria. Hierarchical Agglomerative clustering establishes through the positions as individual clusters and, by the side of every step, combines the mainly similar or neighboring pair of clusters. This needs a definition of cluster similarity or distance. With this we are proposing the multiview point clustering approach with the NMF clustering method. The experimental results will be displayed based on the clustering result of three algorithms.

Keywords

Clustering, Multi-view point, Hierarchical clustering, Hierarchical Agglomerative clustering, Cosine similarity, Non-Negative Matrix Factorization.

References

[1] Johnson,S.C., "Hierarchical Clustering Schemes" Psychometrika, 2:241-254. 1967
[2] Cole, A. J. & Wishart, D. An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. The Computer Journal 13(2):156- 163. (1970).
[3] D'andrade,R., "U-Statistic Hierarchical Clustering" Psychometrika, 4:58-67. 1978
[4] Jeff A. Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. ICSI TR-97-021, U.C. Berkeley,1998.
[5] P. Berkhin. Survey of clustering data mining techniques.Unpublished manuscript, available from accrue.com, 2002.
[6] A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Conference on Computational Learning Theory, pages 92–100, 1998.
[7] U. Brefeld and T. Scheffer. Co-EM support vector learning. In Proc. of the Int. Conf. on Machine Learning, 2004.
[8] M. Collins and Y. Singer. Unsupervised models for named entity classification. In EMNLP, 1999.
[9] S. Dasgupta, M. Littman, and D. McAllester. PAC generalization bounds for co-training. In Proceedings of Neural Information Processing Systems (NIPS), 2001.
[10] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1977.
[11] R. Ghani. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the International Conference on Machine Learning, 2002.
[12] A. Griffiths, L. Robinson, and P. Willett. Hierarchical agglomerative clustering methods for automatic document classification. Journal of Doc., 40(3):175–205, 1984.
[13] K. Kailing, H. Kriegel, A. Pryakhin, and M. Schubert. Clustering multi-represented objects with noise. In Proc. of the Pacific-Asia Conf. on Knowl. Disc. and Data Mining, 2004.
[14] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W. Ma.Recom: Reinforcement clustering of multi-type interrelated data objects. In Proceedings of the ACM SIGIR Conference on Information Retrieval, 2003.
[15] D. Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In Proc. of the 33rd Annual Meeting of the Association for Comp. Linguistics, 1995.
[16] K. Nigam and R. Ghani. Analyzing the effectiveness andn applicability of co-training. In Proceedings of Information and Knowledge Management, 2000.
[17] I. Dhillon, D. Modha,”Concept decompositions for large sparse text data using clustering”, Mach. Learn., Vol. 42, No. 1-2, pp. 143–175, 2001.
[18] W. Xu, X. Liu, Y. Gong,”Document clustering based on nonnegative matrix factorization”, in SIGIR, 2003, pp. 267273.
[19] Shengrui Wang and Haojun Sun. Measuring overlap- Rate for Cluster Merging in a Hierarchical Approach to Color Image Segmentation. International Journal of Fuzzy Systems,Vol.6,No.3,September 2004.
[20] A. Banerjee, I. Dhillon, J. Ghosh, S. Sra,”Clustering on the unit hypersphere using von Mises-Fisher distributions”, J. Mach. Learn. Res., Vol. 6, pp. 1345–1382, Sep 2005.
[21] S. Zhong,”Efficient online spherical K-means clustering”, in IEEE IJCNN, 2005, pp. 3180–3185.
[22] E. Pekalska, A. Harol, R. P. W. Duin, B. Spillmann, H. Bunke,”Non-Euclidean or non-metric measures can be informative”, in Structural, Syntactic, and Statistical Pattern Recognition, ser. LNCS, Vol. 4109, 2006, pp. 871–880.
[23] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.H. Zhou, M. Steinbach, D. J. Hand, D. Steinberg,―Top 10 algorithms in data mining”, Knowl. Inf. Syst., Vol. 14, No. 1, pp. 1–37, 2007.
[24] I. Guyon, U. von Luxburg, R. C. Williamson, “Clustering: Science or Art?”, NIPS‘09 Workshop on Clustering Theory,2009.
[25] M. Pelillo,”What is a cluster? Perspectives from game theory”, in Proc. of the NIPS Workshop on Clustering Theory,2009.
[26] D. Lee, J. Lee,”Dynamic dissimilarity measure for support based clustering”, IEEE Trans. on Knowl. and Data Eng., Vol. 22, No. 6, pp. 900–905, 2010.
[27] A. Banerjee, S. Merugu, I. Dhillon, J. Ghosh,”Clustering with Bregman divergences”, J. Mach. Learn. Res., Vol. 6, pp. 1705–1749, Oct 2005. Volume 2, Issue 6, June 2012
[28] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999
[29] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, volume 13, pages 556–562, 2001.
[30] X. Liu and Y. Gong. Document clustering with cluster refinement and model selection capabilities. In Proceedings of ACM SIGIR 2002, Tampere, Finland, Aug. 2002