An Improved Swarm Based Approach for Efficient Document Clustering

International Journal of Computer Trends and Technology (IJCTT)          
© - June Issue 2013 by IJCTT Journal
Volume-4 Issue-6                           
Year of Publication : 2013
Authors :Kanika Khanna, Madan Lal Yadav


Kanika Khanna, Madan Lal Yadav"An Improved Swarm Based Approach for Efficient Document Clustering"International Journal of Computer Trends and Technology (IJCTT),V4(6):1598-1603 June Issue 2013 .ISSN Published by Seventh Sense Research Group.

Abstract: - Clustering is one basic and important data mining approach used independently as well as the pre-processing stage in many data mining applications. The clustering process basically divides the available dataset into smaller subsets called clusters. These clusters are generally substantially different from one other. In this present work, the clustering is performed on text documents. Text document clustering basically divide the available documents in sub groups based on clustering parameters. The document clustering includes number of basic phenomenon such as document organization, topic extraction and the information retrieval. In this, an improvement clustering approach is defined over the basic clustering approach. The basic clustering approaches that we have to improve in this work are K-Means Clustering and C-Means Clustering. The improvement is here done with the inclusion of PSO (Particle Swarm Optimization).


[1]Michael Steinbach,” A Comparison of Document Clustering Techniques”.
[2] Khaled Hammouda,” Collaborative Document Clustering”.
[3] Benjamin C.M. Fung,” Hierarchical Document Clustering Using Frequent Itemsets”.
[4] Bader Aljaber,” Document Clustering of Scientific Texts Using Citation Contexts”
[5] Oren Zamir,” Web Document Clustering: A Feasibility Demonstration”.
[6] Wei Xu,” Document Clustering Based On Non-negative Matrix Factorization”
[7] Alan F. Smeaton,” An Architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts”
[8] Ye-Hang Zhu,” Document Clustering Method Based on Frequent Co-occurring Words”.
[9] Andreas Hotho,” Wordnet improves Text Document Clustering”.
[10] M. Shahriar Hossain,” GDClust: A Graph-Based Document Clustering Technique”.
[11] Mihai Surdeanu,” A Hybrid Unsupervised Approach for Document Clustering”.
[12] Anna Huang,” Similarity Measures for Text Document Clustering”.

Keywords —Pre-processing, Clustering, Extraction, K-Means, C-Means, PSO