Vector Space Models to Classify Arabic Text
Jafar Ababneh, Omar Almomani, Wael Hadi, Nidhal Kamel Taha El-Omari, Ali Al-Ibrahim. Article: Vector Space Models to Classify Arabic Text. International Journal of Computer Trends and Technology (IJCTT) 7(4):219-223, January 2014. Published by Seventh Sense Research Group.
Abstract-
Text classification is one of the most important tasks in data mining. This paper investigates different variations of vector space models (VSMs) using KNN algorithm. The bases of our comparison are the most popular text evaluation measures. The Experimental results against the Saudi data sets reveal that Cosine outperformed Dice and Jaccard coefficients.
References
[1] F. Sebastiani “Text categorization,” In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, Southampton, UK, 2005, pp. 109—129.
[2] J. Quinlan, C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993.
[3] T. Joachims “Text Categorisation with Support Vector Machines: Learning with Many Relevant Features,” . Proceedings of the European Conference on Machine Learning (ECML), (pp. 173-142). Berlin, 1998, Springer.
[4] E. D. Wiener, J. O. Perdersen, A. S. Weigend. A Neural Network Approach for Topic Spotting. Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR`95), 317-332, 1995.
[5] I. Moulinier, G. Raskinis, J. Ganascia, “Text categorization: a symbolic approach” . Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.
[6] Sawaf, H. Zaplo,J. and Ney. H. (2001). "Statistical Classification Methods for Arabic News Articles". Arabic Natural Language Processing, Workshop on the ACL`2001. Toulouse, France, July.
[7] T. Tokunaga, M. Iwayama, “Text Categorisation Based on Weighted Inverse Document Frequency”. Department of Computer Science, Tokyo Institute of Technology: Tokyo, Japan, 1994.
[8] Y. Yang. “An evaluation of statistical approaches to text categorization”, Journal of Information Retrieval, 1(1/2):67-88, 1999.
[9] M. Junker, R. Hoch, A. Dengel, “On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy”. in Proceedings of the Fifth International Conference on Document Analysis and Recognition. 1999.
[10] M. Syiam, M. Fayed, Z. T., M. B. Habib, “An Intelligent System For Arabic Text Categorization”, IJICIS, Vol.6, No. 1, 2006.
[11] Laila Khreisat, “Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study”. DMIN 2006: 78-82, 2006.
[12] F. Thabtah, W. Hadi, G. Al-Shammare, “VSMs with K-Nearest Neighbour to Categorise Arabic Text Data.”, In The World Congress on Engineering and Computer Science 2008. (pp.778-781), 22-44 October 2008. San Francisco, USA.
[13] S. Al-Harbi, “Automatic Arabic Text Classification” , JADT’08: 9es Journées internationales d’Analyse statistique des Données Textuelles., pp. 77-83, 2008.
[14] B. Hammo, H. Abu-Salem, S. Lytinen, M. Evens, “QARAB: A Question Answering System to Support the Arabic Language”. Workshop on Computational Approaches to Semitic Languages. ACL 2002, Philadelphia, PA, July. pp. 55-65.
[15] M. Benkhalifa, A. Mouradi, H. Bouyakhf. "Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization," Int. J. Intel Syst (16:8), pp.929-947, 2001.
[16] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer. "An kNN Model-based Approach and its Application in Text Categorization," In proceedings of 5th International Conference on Intelligent Text Processing and Computational Linguistic, CICLing, LNCS 2945, Springer-Verlag, pp.559-570, 2004.
[17] M. El-Kourdi, A. Bensaid, T. Rachidi, “Automatic Arabic Document Categorisation Based on the Naïve Bayes Algorithm”. 20th International Conference on Computational Linguistics . August 28th. Geneva, 2004.
[18] A. Samir, W. Ata, N. Darwish. "A New Technique for Automatic Text Categorization for Arabic Documents," 5th IBIMA Conference (The internet & information technology in modern organizations), 2005, Cairo, Egypt.
[19] Y. Yang, X. Liu, “A re-examination of text categorization methods”, Proceedings of the CAN SIGIR Conference on research and Development in Information Retrieval (SIGIR’99), pp.42-49, 1999.
Keywords-Arabic data sets, Data mining, Text categorization, Term weighting, VSM.