A Novel Clustering Method for Text Documents using Neutrosophic Logic

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2016 by IJCTT Journal
Volume-36 Number-4
Year of Publication : 2016
Authors : Wesam AbdulKarem Hamood, Mohammad Naved Qureshi
  10.14445/22312803/IJCTT-V36P135

MLA

Wesam AbdulKarem Hamood, Mohammad Naved Qureshi "A Novel Clustering Method for Text Documents using Neutrosophic Logic". International Journal of Computer Trends and Technology (IJCTT) V36(4):197-203 June 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Classification of text documents is a very crucial task because of the availability of huge amounts of documents available online. Clustering as a part of data mining automates this process of classification of similar documents in a single cluster by grouping like ones together. With the help of clusters we can organize our text documents which are similar at a single place and it helps us to classify other unknown documents in future, to be assigned to one of the known cluster based upon the similarity measure. Automatic clustering is usually based on words. So far K means clustering is used to cluster the documents but its nut much accurate regarding the assignment of documents to cluster. So in this paper we have used a novel method for clustering using Neutrosophic logic. In case of fuzzy logic we deals with two values i.e either degree of truth or falsity but in case of Neutrosophic logic another factor is also involved called as indeterminacy. Indeterminacy applies to the situation when a particular document is not sure that it belongs to cluster i or cluster j. Our method has three phases. First generate the dataset according to relative frequency of words in a document. Secondly decide seed documents for different clusters with the help of equilidian distance between different documents. Finally calculate the T, I, F values for all documents with respect to all clusters. Then decide the cluster for each document on the basis of T, I, F values.

References
[1] Elkan, C. (2003). "Using the triangle inequality to accelerate k-means".Proceedings of the Twentieth International Conference on Machine Learning (ICML).
[2] Anil K. Jain and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall.The biography of Vance Faber appears on page 149.
[3] J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3: 32-57.
[4] J. Dunn, “A fuzzy relative of the Isodata process and its use in detecting compact, well-separated clusters”, Journal of Cybernetics, 3(3), pp. 32–57, 1973.
[5] J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York, 1981.
[6] J. Bezdek and R. Hathaway, “Recent convergence results for the fuzzy c-means clustering algorithms”, Journal of Classification, 5(2), pp. 237–247, 1988.
[7] L. Zadeh, “Fuzzy sets”, Information and Control, 8, pp. 338– 352, 1965.
[8] F. Smarandache, "Neutrosophy / Neutrosophic probability, set, and logic", American Research Press, 1998.
[9] Olson, David L.; and Delen, Dursun (2008), “Advanced Data Mining Techniques”, Springer, 1st edition (February 1, 2008), page 138, ISBN 3-540-76916-1.
[10] Tian Zhang, Raghu Ramakrishnan, Miron Livny, "BIRCH: an efficient data clustering method for very large databases", article, pp:103-114,1996.
[11] Hartigan, J. A.; Wong, M. A. (1979). "Algorithm AS 136: A K-Means Clustering Algorithm". Journal of the Royal Statistical Society, Series C 28 (1): 100–108. JSTOR 2346830.
[12] J.A. Hartigan (1975). Clustering algorithms. John Wiley & Sons, Inc.
[13] A. P. Dempster , N. M. Laird and D.B. Rubin "Maximum likelihood from incomplete data via the EM algorithm", J. Royal Statiscal Soc., vol. 39, no. 1, pp.1 -38 1977.

Keywords
Clustering, Neutrosophic logic, Fuzzy logic, K-means.