Hybrid Combination of Error Back Propagation and Genetic Algorithm for Text Document Clustering

© 2020 by IJCTT Journal
Volume-68 Issue-11
Year of Publication : 2020
Authors : Ashwani Mathur
DOI :  10.14445/22312803/IJCTT-V68I11P109

How to Cite?

Ashwani Mathur, "Hybrid Combination of Error Back Propagation and Genetic Algorithm for Text Document Clustering," International Journal of Computer Trends and Technology, vol. 68, no. 11, pp. 64-68, 2020. Crossref, 10.14445/22312803/IJCTT-V68I11P109

High dimensional test data need clustering. So clustering is an important and difficult task to perform when automation is required. Many scholars are working in this field to reduce manual operation or background information passing. This paper has proposed a model for documents clustering without having back-ground information. Document term features were extracted and collect in a matrix as per term frequency value. A genetic algorithm was applied to cluster each term in a cluster as per the similarity of content. Term frequency distance was a measuring evaluation parameter for finding the fitness of the chromosome. Cluster centers representing document terms were obtained from genetic algorithms. The output of the genetic algorithm was used as a training vector for the document cluster class identification. The experiment was done on a real dataset of research articles from various fields of engineering. The result shows that the proposed model has increased the precision, recall, and accuracy parameter of document clustering.

[1] Abroyan N. Convolutional and recurrent neural networks for real-time data classification. Seventh International Conference on innovative Computing Technology (INTECH) : 2017 : 42-45. IEEE.
[2] Zhang Y, Er MJ, Venkatesan R, Wang N, Pratama M. Sentiment classification using comprehensive attention recurrent models. International Joint Conference on neural Networks (IJCNN) : 2016 : 1562-1569. IEEE.
[3] B. Gourav& R. Jindal, Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold, International Journal of Computer Applications. 33(5) (2011).
[4] B.P.Yudha, and R. Sarrno. Personality classification based on Twitter text using Naive Bayes, KNN and SVM, In Data and Software Engineering (ICoDSE), in proceedings of International Conference 170-174. (2015) IEEE.
[5] B.Tang, H. He, et al., A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering 28(6) (2016) 1602-1606.
[6] X. Wang, J. Wang, et al., Labelled LDA-Kernel SVM: A Short Chinese Text Supervised Classification Based on Sina Weibo. In 2017 4th International Conference on Information Science and Control Engineering (ICISCE) : 2017 : 428-432. IEEE.
[7] T.-H. Chen, S. W. Thomas, and A. E. Hassan, A survey on the use of topic models when mining software repositories, Empirical Softw. Eng., 21(5) (2015) 1843–1919.
[8] M. Erkens, D. Bodemer, and H. U. Hoppe, Improving collaborative learning in the classroom: Text mining based grouping and representing, Int. J. Comput.-Supported Collaborative Learn., 11(4) (2016) 387–415.
[9] Alan Díaz-Manríquez , Ana Bertha Ríos-Alvarado, José Hugo Barrón-Zambrano, Tania Yukary Guerrero-Melendez, And Juan Carlos Elizondo-Leal. An Automatic Document Classifier System Based on Genetic Algorithm and Taxonomy. (2018).
[10] Madhulika Yarlagadda, K.Gangadhara Rao, A.Srikrishna. Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering. Journal of King Saud University - Computer and Information Sciences (2019).
[11] Rana Husni AlMahmoud, Bassam Hammo, Hossam Faris. A modified bond energy algorithm with fuzzy merging and its application to Arabic text document clustering. Expert Systems with Applications 159 (2020).

Clustering, Document Clustering, Genetic Algorithm, Text Mining, Pattern Feature.