Application of Data Mining Classification Algorithms for Afaan Oromo Media Text News Categorization

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2019 by IJCTT Journal
Volume-67 Issue-7
Year of Publication : 2019
Authors :  Etana Fikadu Dinsa, Ramesh Babu P
DOI :  10.14445/22312803/IJCTT-V67I7P112

MLA

MLA Style: Etana Fikadu Dinsa, Ramesh Babu P"Application of Data Mining Classification Algorithms for Afaan Oromo Media Text News Categorization" International Journal of Computer Trends and Technology 67.7 (2019): 73-79.

APA Style Etana Fikadu Dinsa, Ramesh Babu P. Application of Data Mining Classification Algorithms for Afaan Oromo Media Text News Categorization International Journal of Computer Trends and Technology, 67(7),73-79.

Abstract
This research proposes a model, Afaan Oromo Text categorization, which helps to automatically categorize texts to predefined classes. Text categorization is the task of assigning an electronic document to one or more categories, based on its contents. Document classification can be done manually or automatically. Manual text categorization is carried out by human experts. It requires a certain level of vocabulary recognition and knowledge processing. Automatic classification is a process of classifying documents into a number of classes using machine learning methods. Automatic document categorization reduces searching time, thereby facilitating the searching process. In this research, we deal with Itemset method based Afaan Oromo news document categorization using Apriori Algorithm. In text document categorization, each word contained in a document is referred as item. As a part of this work, apriori algorithm is used for generating frequent item in a given text document. Among the automatic classifiers which are applicable on high dimensional data, two of them; Naïve Bayes (NB) and bayes networking have been experimented on the Total data. The data the pre-processed Afaan Oromo text items is organized into categories of nine classes for the experimentation purpose and the experimentation uses 10-fold stratified cross validation for training and test data. The performance of the classification is analyzed to measure the accuracy of the classifiers in categorizing the Afaan Oromo news documents in to specified categories. The best result obtained by bayes networking Classifier is 97.15% and Naïve Bayes (NB) is 95.666% on nine categories data. This research indicated that bayes networking Classifier is more relevant for categorizing Afaan Oromo news document.

Reference
[1] Teferi Degeneh, 2015, The Development of Oromo Writing System, Doctor of Philosophy (PhD) thesis, University of Kent.
[2] Meron Sahlemariam, “Concept-based automatic Amharic Document categorization, “MSc Thesis, 2009
[3] Brussels, “Ontologies - Introduction and Overview”, Unpublished MSc Thesis Vrije Universiteit Brussel, 2004
[4] Maron M. and Kuhns J., "Probabilist Indexing and Information Retrieval.," London ACM, pp. PP 22-35, 1760.
[5] Sebastiani, F.: Text Categorization. In Alessandro Zanasi (ed.), Text Mining and its Applications, WIT Press, Southampton, UK, 2005, ppt 109-129.
[6] Kamal, et al “Afaan Oromo News Text Categorization using Decision Tree Classifier and Support Vector Machine: A Machine Learning Approach”, published May, 2017
[7] C agri Toraman, ""Text Categorization and Ensemble Pruning in Turkish News Portals"," August, 1811.
[8] A.,Nigam, K., Thrun, S. and Mitchell, T. McCallum, "Text Classification from Labeled and Unlabeled Documents Using EM. Boston: ," Kluwer Academic Publishers, 39(2), pp. pp.103–125, 1800.
[9] Birmingham, Python Text Processing with NLTK 2.0 Cookbook.pdf(August 24, 2010)
[10] Zelalem Sintayehu, Automatic Amharic news Categorization, A thesis submitted to the School of Graduate Studies of Addis Ababa University in partial fulfillment of the requirements for the Degree of Master of Science in Information Science
[11] [11].Frédéric Flouvat · Fabien De Marchi · Jean-Marc Petit, A new categorization of datasets for frequent itemsets, J Intell Inf Syst (2010) 34:1–19.
[12] [12].Darko Zelenika, Janez Povh, Andrej Dobrovoljc, Document classification, 2012
[13] CLIFTON Phua, Vincent Lee, Kate Smith & Ross Gayler, A Comprehensive Survey of Data Mining-based Fraud Detection Research
[14] Parks B., 1999 "Basic News Writing”," united states. http://www.ohlone.edu/people/bparks/./basicnewswriting.pdf.
[15] Jiri Hynek and Karel jezek, Use of Text Mining Methods in a Digital Library, elpub2002

Keywords
Categorization, Data mining, Apriori, Classifier algorithms, Machine learning and Itemset