Introduction to KEA-Means Algorithm for Web Document Clustering

Swapnali Ware; N.A.Dhawas

doi:10.14445/22312803/IJCTT-V3I4P120

Research Article | Open Access | Download PDF

Volume 3 | Issue 4 | Year 2012 | Article Id. IJCTT-V3I4P120 | DOI : https://doi.org/10.14445/22312803/IJCTT-V3I4P120

Introduction to KEA-Means Algorithm for Web Document Clustering

Swapnali Ware, N.A.Dhawas

Citation :

Swapnali Ware, N.A.Dhawas, "Introduction to KEA-Means Algorithm for Web Document Clustering," International Journal of Computer Trends and Technology (IJCTT), vol. 3, no. 4, pp. 495-498, 2012. Crossref, https://doi.org/10.14445/22312803/IJCTT-V3I4P120

Abstract

In most traditional techniques of document clustering, the number of total clusters is not known in advance and the cluster that contains the target information or précised information associated with the cluster cannot be determined. This problem solved by K-means algorithm. By providing the value of no. of cluster k. However, if the value of k is modified, the precision of each result is also changes. To solve this problem, this paper introduces a new clustering algorithm known as KEA-Means algorithm which will combines the kea i.e. key phrase extraction algorithm which returns several key phrases from the source documents by using some machine learning language by creating model which will contains some rule for generating the no. of clusters of the web documents from the dataset .this algorithm will automatically generates the number of clusters at the run time here. User need not to specify the no. of clusters. This Kea-means clustering algorithm provides the value of k and will be beneficial to extract test documents from massive quantities of resources.

Keywords

K-means clustering, Kea key phrase extraction algorithm, KEA-Means algorithm, F-measure.

References

[1] Alexander S., Joydeep G. and Raymond M 2000.Impact of similarity measures on web page clustering. University of Texas at Austin, TX, 78712-1084, USA.
[2] M.Steinbach, G.Karypis, V.Kumar 2000.A comparison of document clustering techniques.proc.KDD Workshop on Text Mining, 1-20.
[3] Teknomo, Kardi. K-Means Clustering Tutorials. http:\people.revoledu.comkardi utorialkMean
[4] P.Turney 1999.Coherent keyphrase extraction via web mining”, Technical Report ERB-1057, Institute for Information Technology, National Research Council of Canada.
[5] P.Turney 2003.”Learning to extract keyphrases from text”, proc.18th International Joint Conference on Artificial Intelligence (IJCAI), 434-439, 2003.
[6] Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin and Craig G. Nevill-Manning 1999.KEA: Practical Automatic Keyphrase Extraction. Dept. of computer science university of Waaikato.
[7] Shen Huang, Zheng Chen, Yong Yu, and Wei Ying Ma. 2006. Multitype Features Coselection for Web Document Clustering. IEEE transactions on knowledge and data engineering, vol. 18, no. 4, April 2006.
[8] Shobha Sanjay Raskar and D.M. Thakore 2010. Kea-mean clustering approach for text mining. International Journal of Power Control Signal and Computation (IJPCSC) Vol. 2 No.
[9] Jiang- Chun Song, Jun-Yi Shen 2003.A web document clustering algorithm based on concept of neighbor. Proceedings of the Second International Conference on Machine Learning and Cybernetics Wan, 2-5 November 2003.
[10] D. Lewis. Reuters-21578 text categorization text collection1.0 .http://www.daviddlewis.com/resources/testcollections/reuters21578