Enhancing Binary Classification by Modeling Uncertaion Boundary in Three Way Decision using Multi Document Classification
|
International Journal of Computer Trends and Technology (IJCTT) | |
© 2018 by IJCTT Journal | ||
Volume-57 Number-2 |
||
Year of Publication : 2018 | ||
Authors : N.Murugan | ||
DOI : 10.14445/22312803/IJCTT-V57P116 |
N.Murugan,"Enhancing Binary Classification by Modeling Uncertaion Boundary in Three Way Decision using Multi Document Classification". International Journal of Computer Trends and Technology (IJCTT) V57(2):84-87, March 2018. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract
Text classification is a process of classifying documents into predefined categories through different classifiers learned from labelled or unlabeled training samples. The binary text classification attempt to find a more effective way to separate relevant texts from a large dataset. The current text classifiers cannot explain the decision boundary between positive and negative objects due to the uncertainties caused by text feature selection and the knowledge learning process. This paper proposes a three-way decision model for dealing with the uncertain boundary to improve the binary text classification performance based on the rough set techniques and centroid solution. Its ultimate aim is to make us understand the uncertain boundary through classifying the training samples into three regions as the positive, boundary and negative regions by two main boundary vectors. The four decision rules are proposed from the training process and applied to the incoming documents for more classification. A large number of text have been conducted based on the standard data sets RCV1.The proposed model has significantly improved the performance of binary text classification in term of measure and accuracy area compared with six other popular baseline models.
References
[1] R. Y. Lau, P. D. Bruza, and D. Song, “Towards a belief-revision based adaptive and context-sensitive information retrieval system,”ACM Trans. Inf. Syst., vol. 26, no. 2, pp. 8.1–8.38, 2008.
[2] Y. Li, A. Algarni, and N. Zhong, “Mining positive and negative patterns for relevance feature discovery,” in Proc. 16th ACMSIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 753– 762, 2007.
[3] F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surveys, vol. 34, no. 1, pp. 1–47, 2009.
[4] G. H. John and P. Langley, “Estimating continuous distributions in Bayesian classifiers,” in Proc. 11th Conf. Uncertainty Artif. Intell.,, pp. 338–345,2008
[5] T. Joachims,“Transductive inference for text classification using support vector machines,” in Proc. 16th Int. Conf. Mach. Learn,pp. 200–209,2010.
[6] B. Liu, Y. Dai, X. Li, W. S. Lee, and P. S. Yu, “Building text classifiers using positive and unlabeled examples,” in Proc. 3rd IEEEInt. Conf. Data Mining,pp. 179–186,2013
[7] J. Chen, H. Huang, S. Tian, and Y. Qu, “Feature selection for text classification with Na€?ve Bayes,” Expert Syst. Appl., vol. 36, no. 3,pp. 5432–5435, 2010.
[8] C. C. Aggarwal and C. Zhai, “A survey of text classification algorithms,” in Mining Text Data. Berlin, Germany: Springer, pp. 163–222, 2012.
[9] M. A. Bijaksana, Y. Li, and A. Algarni, “A pattern based two-stage text classifier,” in Machine Learning and Data Mining in Pattern Recognition. Berlin, Germany: Springer, pp. 169–182, 2013.
[10] L. Zhang, Y. Li, C. Sun, and W. Nadee, “Rough set based approach to text classification,” in Proc. IEEE/WIC/ACM Int. JointConf. Web Intell. Intell. Agent Technol, vol. 3, pp. 245–252,2013
[11] M. Haddoud, A. Mokhtari, T. Lecroq, and “Combining supervised term-weighting metrics for SVM textclassification with extended term representation,” Knowl. Inf. Syst., vol. 49, no. 3, pp. 909–931, 2016.
[12] F. Sebastiani, “Machine learning in automated text categorization,” ACM Comput. Surveys, vol. 34, no. 1, pp. 1–47, 2012.
[13] C. Manning, P. Raghavan, and H. Sch€utze, Introduction to Information Retrieval, vol. 1. Cambridge, U.K.: Cambridge Univ. Press, pp.1-68,2008.
Keywords
Uncertain decision boundary, text classification, three-way decision, rough set, decision rule