Enhanced Classification to Counter the Problem of Cluster Disjuncts

International Journal of Computer Trends and Technology (IJCTT)          
© 2014 by IJCTT Journal
Volume-18 Number-5
Year of Publication : 2014
Authors : Syed Ziaur Rahman , Dr.G. Samuel Vara Prasad Raju
DOI :  10.14445/22312803/IJCTT-V18P148


Syed Ziaur Rahman , Dr.G. Samuel Vara Prasad Raju "Enhanced Classification to Counter the Problem of Cluster Disjuncts". International Journal of Computer Trends and Technology (IJCTT) V18(5):217-224, Dec 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
This paper presets a rigorous yet practical model dubbed as Cluster Disjunct Minority Oversampling Technique (CDMOTE) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using fifteen UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that CDMOTE have been believed to be effective in addressing the class imbalance problem.

[1] Rukshan Batuwita and Vasile Palade (2010) FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning, IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010, pp no:558-571.
[2] N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, pp. 429-450, 2002.
[3] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Proc. 14th Int’l Conf. Machine Learning, pp. 179-186, 1997.
[4] G.E.A.P.A. Batista, R.C. Prati, and M.C. Monard, “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,” SIGKDD Explorations, vol. 6, pp. 20-29, 2004.1.
[5] Siti Khadijah Mohamada, Zaidatun Tasir. “Educational data mining: A review”, Procedia - Social and Behavioral Sciences 97 ( 2013 ) 320 – 324.
[6] Hongzhou Sha, Tingwen Liu, Peng Qin, Yong Sun, Qingyun Liu.” EPLogCleaner: Improving Data Quality of Enterprise Proxy Logs for Efficient Web Usage Mining” Procedia Computer Science 17 ( 2013 ) 812 – 818.
[7] M.S.B. PhridviRaj, C.V. GuruRao.” Data mining – past, present and future – a typical survey on data Streams”, Procedia Technology 12 ( 2014 ) 255 – 263.
[8] Chumphol Bunkhumpornpat, Krung Sinapiromsaran, Chidchanok Lursinsap.” DBSMOTE: Density-Based Synthetic Minority Over-sampling Technique” Appl Intell (2012) 36:664–684.
[9] Matías Di Martino, Alicia Fernández, Pablo Iturralde, Federico Lecumberry.” Novel classifier scheme for imbalanced problems”, Pattern Recognition Letters 34 (2013) 1146–1151.
[10] V. Garcia, J.S. Sanchez , R.A. Mollineda,” On the effectiveness of preprocessing methods when dealing with different levels of class imbalance”, Knowledge-Based Systems 25 (2012) 13–21.
[11] María Dolores Pérez-Godoy, Alberto Fernández, Antonio Jesús Rivera, María José del Jesus,” Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets”, Pattern Recognition Letters 31 (2010) 2375–2388.
[12] Der-Chiang Li, Chiao-WenLiu, SusanC.Hu,” A learning method for the class imbalance problem with medical data sets”, Computers in Biology and Medicine 40 (2010) 509–518.
[13] Enhong Che, Yanggang Lin, Hui Xiong, Qiming Luo, Haiping Ma,” Exploiting probabilistic topic models to improve text categorization under class imbalance”, Information Processing and Management 47 (2011) 202–214.
[14] Alberto Fernández, María José del Jesus, Francisco Herrera,” On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets”, Information Sciences 180 (2010) 1268–1291.
[15] J. Burez, D. Van den Poel,” Handling class imbalance in customer churn prediction”, Expert Systems with Applications 36 (2009) 4626–4636.
[16] Che-Chang Hsu, Kuo-Shong Wang, Shih-Hsing Chang,” Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization”, Expert Systems with Applications 38 (2011) 4698–4704.
[17] Alberto Fernández, María José del Jesus, Francisco Herrera,” On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets”, Expert Systems with Applications 36 (2009) 9805–9812.
[18] Jordan M. Malof, Maciej A. Mazurowski, Georgia D. Tourassi,” The effect of class imbalance on case selection for case-based classifiers: An empirical study in the context of medical decision support”, Neural Networks 25 (2012) 141–145.
[19] A. Asuncion D. Newman. (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online]. Available: http://www.ics.uci.edu/?mlearn/MLRepository.html.
[20] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and techniques. 2nd edition Morgan Kaufmann, San Francisco.
[21] J. R. Quinlan, C4.5: Programs for Machine Learning, 1st ed. San Mateo, CA: Morgan Kaufmann Publishers, 1993.
[22] J.R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
[23] N. Chawla, K. Bowyer, and P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
[24] T. Jo and N. Japkowicz, “Class Imbalances versus Small Disjuncts,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 40-49, 2004.
[25] N. Japkowicz, “Class Imbalances: Are We Focusing on the Right Issue?” Proc. Int’l Conf. Machine Learning, Workshop Learning from Imbalanced Data Sets II, 2003.
[26] R.C. Prati, G.E.A.P.A. Batista, and M.C. Monard, “Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior,” Proc. Mexican Int’l Conf. Artificial Intelligence, pp. 312-321, 2004.
[27] G.M. Weiss, “Mining with Rarity: A Unifying Framework,” ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 7-19, 2004.
[28] Mohamed Bekkar and Dr. Taklit Akrouf Alitouche, 2013. Imbalanced Data Learning Approaches Review. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.4, July 2013.

Classification, class imbalance, cluster disjunct, CDMOTE.