Document Clustering in Web Search Engine

A.S.N.Chakravarthy; Deepthi.S; K.Satyatej; Sk.Nizmi; S.Sindhura

doi:https://doi.org/10.14445/22312803/IJCTT-V3I2P117

Research Article | Open Access | Download PDF

Volume 3 | Issue 2 | Year 2012 | Article Id. IJCTT-V3I2P117 | DOI : https://doi.org/10.14445/22312803/IJCTT-V3I2P117

Document Clustering in Web Search Engine

A.S.N.Chakravarthy, Deepthi.S, K.Satyatej, Sk.Nizmi, S.Sindhura

Citation :

A.S.N.Chakravarthy, Deepthi.S, K.Satyatej, Sk.Nizmi, S.Sindhura, "Document Clustering in Web Search Engine," International Journal of Computer Trends and Technology (IJCTT), vol. 3, no. 2, pp. 286-289, 2012. Crossref, https://doi.org/10.14445/22312803/IJCTT-V3I2P117

Abstract

As the number of web pages grows, it becomes more difficult to find the relavant documents from the information retrieval engines, so by using clustering concept we can find the grouped relavant documents. The main purpose of clustering techniques is to partitionate a set of entities into different groups, called clusters. These groups may be consistent in terms of similarity of its members. As the name suggests, the representative-based clustering techniques uses some form of representation for each cluster. Thus, every group has a member that represents it. The main use is to reduce the cost of the algorithm, the use of representatives makes the process easier to understand.

Keywords

Document clustering, k-means.

References

[1] Chan, L.M.: Cataloging and Classification : an Introduction. McGraw-Hill, New York, 1994
[2] R. Kannan, S. Vempala, and Adrian Vetta, “On Clusterings: Good, Bad, and Spectral”, Proc. of the 41st Foundations of Computer Science, Redondo Beach, 2000.5
[3] S. Kantabutra, Efficient Representation of Cluster Structure in Large Data Sets, Ph.D. Thesis, Tufts University, Medford, MA, September2001
[4] Dan Pelleg and Andrew Moore: X-means: Extending kmeans with efficient estimation of the number of clusters. In Proceedings of the Seventeenth International Conference on Machine Learning, Palo Alto, CA, July 2000..

[5] Aristides Likas, Nikos Vlassis and Jacob J. Verbeek: The global k-means clustering algorithm. In Pattern Recognition Vol 36, No 2, 2003.
[6] J. Matoušek. On the approximate geometric k-clustering. Discrete and Computational Geometry. 24:61-84, 2000
[7] Dan Pelleg and Andrew Moore: Cached sufficient statistics for efficientmachine learning with large datasets. In Journal of Artificial Intelligence Research, 8:67-91, 1998.
[8]A Document Clustering Algorithm for Web Search Engine Retrieval System ,2010 Hongwei Yang School of Software, Yunnan University, Kunming 650021, China; Education Science Research Academy of Yunnan, Kunming 650223, China.