Document Classification of Assamese Text Using Naïve Bayes Approach

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-30 Number-4
Year of Publication : 2015
Authors : Moromi Gogoi, Shikhar Kumar Sarma


Moromi Gogoi, Shikhar Kumar Sarma "Document Classification of Assamese Text Using Naïve Bayes Approach". International Journal of Computer Trends and Technology (IJCTT) V30(4):182-186, December 2015. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
Document classification has become an emerging technique in the field of research due to the abundance of documents available in digital form. Document classification can be used to organize data into smaller and meaningful classes. Correctly identifying a document into a particular class is still a huge challenge particularly in Assamese text as very few work has been done in this field . In this paper we have done document classification using Naïve bayes classifier. In regards to the various classifying approaches, Naïve Bayes is potentially good at serving as a document classification model due to its simplicity. The aim of this paper is to highlight the performance of employing Naïve Bayes in document classification. In this paper the document is classified into one of the four classes i.e. sports, politics , law and science. To build and evaluate the classification model, a total 200 documents is split into two datasets, namely training set and testing set, in which 60% of the documents is used as training set whereas the remaining 40% is used as the testing set. The results have been validated using statistical measures of precision , recall and their combination F-measure. Results show that Naïve Bayes is a good classifiers.

[1] E.H. Han, G. Karypis, and V. Kumar, Text categorization using weight adjusted k-nearest neighbour classification, Department of Computer Science and Engineering, Army HPC Research Center, University of Minnesota, 1999.
[2] A. McCallum, and K. Nigam, “A comparison of event models for naïve Bayes text classification”, Journal of Machine Learning Research, Vol. 3, 2003, pp. 1265–1287.
[3] S. Chakrabarti, S. Roy, and M.V. Soundalgekar, “Fast and accurate text classification via multiple linear discriminant projection”, The VLDB Journal The International Journal on Very Large Data Bases, 2003, pp. 170–185. International Journal of Software Engineering and Its Applications Vol. 5, No. 3, July, 201146
[4] J.R. Quinlan, C4.5: programs for machine learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993. [5] S. Wermter, “Neural network agents for learning semantic text classification”, Information Retrieval, Vol. 3, No. 2, 2004, pp. 87-103.
[6] K. Nigam, J. Lafferty, and A. McCallum, “Using maximum entropy for text classification”, In Proceedings: IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61–67, 1999.
[7] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features”, In Proceedings: Machine Learning: ECML-98, 10th European Conference on Machine Learning, pp. 137–142, 1998.
[8] G Siva Charan, Kavi Narayana Murthy, and S Durga Bhavani. “Text categorization in indian languages”. In R M K Sinha and V N Shukla, editors, Proceedings of ICSLT-OCOCOSDA – I STRANS 2004 International Conference - Vol 1, pages 56-61. Tata McGraw-Hill Publishing Company Ltd, 2004.
[9] Nidhi, Vishal Gupta, 2012. “Domain Based Classification of Punjabi Text Documents using Ontology and Hybrid Based Approach” Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), COLING.
[10] Murthy, Kavi Narayana. “Automatic Categorization of Telugu News Articles”. In: Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, DOI= (2003)
[11] Yang, Y., Chute, C.G.: “An example-based mapping method for text categorizationand retrieval”. In: ACM Transaction on Information Systems: 253-277(1994)
[12] R ajan, K., Ramalingam, V.,Ganesan, M., Palanivel, S. and Palaniappan, B “Automatic Classification of Tamil documents using Vector Space Model and Artificial Neural network”. In: Expert Systems with Applications, Elsevier, Volume 36 Issue 8, DOI= 10.1016/j.eswa.2009.02.010. . (2009).
[13] K Raghuveer and Kavi Narayana Murthy, “Text Categorization in Indian Languages using Machine Learning Approaches”
[14] Naïve Bayes Text Classification. /htmledition/naivebayes- text-classification1 .html

Document classification, Naive Bayes.