A Survey On Text Categorization

International Journal of Computer Trends and Technology (IJCTT)          
© - Issue 2012 by IJCTT Journal
Volume-3 Issue-1                           
Year of Publication : 2012
Authors :S.Niharika, V.Sneha Latha, D.R.Lavanya.


S.Niharika, V.Sneha Latha, D.R.Lavanya. "A Survey On Text Categorization"International Journal of Computer Trends and Technology (IJCTT),V3(1):735-741 Issue 2012 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: -Now a day’s managing a vast amount of documents in digital forms is very important in text mining applications. Text categorization is a task of automatically sorting a set of documents into categories from a predefined set. A major characteristic or difficulty of text categorization is high dimensionality of feature space. The reduction of dimensionality by selecting new attributes which is subset of old attributes is known as feature selection. Feature-selection methods are discussed in this paper for reducing the dimensionality of the dataset by removing features that are considered irrelevant for the classification. In this paper we discuss several approaches of text categorization, feature selection methods and applications of text categorization.


[1] Berry Michael W., Automatic Discovery of Similar Words, in “Survey of Text Mining: Clustering, Classification and Retrieval”, Springer Verlag, New York, LLC, 2004, pp.24-43.
[2] Vishal gupta and Gurpreet S. Lehal , “A survey of text mining techniques and applications”, journal of emerging technologies in web intelligence, 2009,pp.60-76.
[3] Sebastiani F., “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, vol. 34 (1),2002, pp. 1-47.
[4 ] Zu G., Ohyama W., Wakabayashi T., Kimura F., "Accuracy improvement of automatic text classification based on feature transformation": Proc: the 2003 ACM Symposium on Document Engineering, November 20-22, 2003,pp. 118- 120.
[5] Setu Madhavi Namburu, Haiying Tu, Jianhui Luo and Krishna R. Pattipati , “Experiments on Supervised Learning Algorithms for Text Categorization”, International Conference , IEEE computer society,2005, 1-8.
[6] D. E. Johnson, F. J. Oles, T. Zhang, T. Goetz,“A decision-tree-based symbolic rule induction system for text categorization”, IBM Systems Journal, September 2002.
[7] Kim S. B., Rim H. C., Yook D. S. and Lim H. S., “Effective Methods for Improving Naïve Bayes Text Classifiers”, LNAI 2417, 2002, pp.414-423.
[8] Klopotek M. and Woch M., “Very Large Bayesian Networks in Text Classification”, ICCS 2003, LNCS 2657, 2003, pp. 397-406.
[9] Joachims, T., Transductive inference for text classification using support vector machines. Proceedings of ICML-99, 16th International Conference

Keywords— Text mining, text classification, feature selection.