Feature Selection to Reduce Dimensionality of Heart Disease Dataset Without Compromising Accuracy

International Journal of Computer Trends and Technology (IJCTT)          
© 2019 by IJCTT Journal
Volume-67 Issue-6
Year of Publication : 2019
Authors : Shiwani Gupta, R. R. Sedamkar
DOI :  10.14445/22312803/IJCTT-V67I6P109


MLA Style:Shiwani Gupta, R. R. Sedamkar"Feature Selection to Reduce Dimensionality of Heart Disease Dataset Without Compromising Accuracy" International Journal of Computer Trends and Technology 67.6 (2019): 57-64.

APA Style Shiwani Gupta, R. R. Sedamkar. Feature Selection to Reduce Dimensionality of Heart Disease Dataset Without Compromising AccuracyInternational Journal of Computer Trends and Technology, 67(6),57-64.

Performance of machine classification is greatly affected by the selection of features and in medical field, accumulating data is a costly aspect. Even there is an increasing overfitting risk when no. of observations is insufficient and need for significant computation time when no. of features is more. Hence, it would be better if machines could extract most informative features i.e. medically highrisk factors to reduce the cost overhead on patients. Feature Selection is essential for simpler, faster, more reliable and robust machine learning models. Since wrapper-based methods are computationally expensive and filter-based methods are quicker, the authors claim through experimentation that filter based feature selection methods followed by wrapper can considerably reduce the size of feature set as well as enhance accuracy of prediction models onto high dimensional datasets without having to increase the number of instances. Results have been demonstrated on Arrythmia dataset from UCI Machine Learning Repository with 280 features and Z-Alizahdehsani dataset with 55 features.

[1] H. M. Lee, C. M. Chen, J. M. Chen and y. L. Jou, “An efficient Fuzzy classifier with Feature selection based on Fuzzy entropy”, IEEE Transactions on Systems, Man and Cybernetics, Vol 31, No. 3, Jun 2001.
[2] L. Huang, and C. J. Wang, “GA based feature selection and parameter optimization for support vector machine”, Elsevier Expert system with applications 31 (2006) 231-240.
[3] J. Catlett, “On changing continuous attributes into ordered discrete attributes”, SpringerLink, Jun 2005.
[4] S. M. Saqlain, M. Sher, F. A. Shah, I. Khan, M. U. Ashraf, M. Awais, A. Ghani, “Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines”, Springer Nature 2018.
[5] Aha, “Tolerating noisy, irrelevant and novel attributes in Instance Based Learning Algorithms”, Elsevier ScienceDirect, Feb 1992.
[6] H. Liu, and L. Yu, “Towards integrating feature selection algorithms for classification and clustering” IEEE Transactions on knowledge and data engineering, Vol. 17, No. 4, Apr 2005.
[7] R. Kohavi, G. H. John, “Wrappers for Feature Selection”, Elsevier Artificial Intelligence (1997) 273-324. A. Pandey, P. Pandey, K. L. Jaiswal, and A. K. Sen, “Datamining clustering techniques in prediction of heart disease using Attribute selection method”, International Journal of Science, Engineering and Technology Research (IJSETR) Volume 2, Issue 10, October 2013.
[8] Peterkova, M. Nemeth, G. Michalconok, and A. Bohm “Computing Importance Value of Medical Data Parameters in Classification Tasks and Its Evaluation Using Machine Learning Methods”, Springer 2019.
[9] M. L Raymer, W. L. Punch, E, D, Goodman, L. A. Kuhn, and A. K. Jain, “Dimensionality reduction using Genetic Algorithms”, IEEE Transaction on Evolutionary Computation, Vol 4, No. 2, Jul 2000.
[10] S. Bhatia, P. Prakash, G. N. Pillai, “SVM based decision support system for heart disease classification with integer coded genetic algorithm to select critical features”, WCECS, October 22 - 24, 2008, San Francisco, USA.
[11] Babaoglu, O. Findik, E. Ulker, “A comparison of feature selection models utilizing binary particle swarm optimization and genetic algorithm in determining coronary artery disease using support vector machine”, Elsevier Expert system with applications 37 (2010) 3177-3183.
[12] T. Santhanam, M. S. Padmavathi, “Application of K-means and Genetic algorithm for dimension reduction by integrating SVM for diabetes diagnosis”, ScienceDirect Elsevier Procedia Computer Science 47 (2015) 76-83.
[13] Yang, V. Honavar, “Feature subset selection using Genetic Algorithm”, Kluwer Academic Publishers 1998.
[14] Subanya, Dr. R. R. Rajalaxmi, “Feature selection using Artificial Bee Colony for Cardiovascular Disease classification”, 2014 International Conference on Electronics and Communication System.
[15] Y. Li, G. Wang, H. Chen, H. Dong, X. Zhu, S. Wang, “An improved Particle warm Optimization for Feature Selection”, ScienceDirect.com Journal of Bionic Engineering 8 (2011) 191-200.
[16] Kononenko, “Estimating Attributes: Analysis and Extensions of Relief” CiteSeerX, 1994. Sequential
[17] M. Zhao, C. Fu, L. Ji, K. Tang, M. Zhou, “Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes”, Elsevier Expert system with applications 38 (2011) 5197-5204.
[18] M. A. Hall, “Correlation based Feature selection for Machine Learning”, Univ. of Waikato, Apr 1999.
[19] Radha R., Murlidhar S., “Removal of redundant and irrelevant data from training datasets using speedy feature selection method”, International journal of Computer science and mobile computing Vol 5, Issue 7, Jul 2016, pg. 359-364.
[20] T. T. Zhao, Y. B. Yuan, Y. J. Wang, J. Gao, P. He, “Heart disease classification based on feature fusion”, IEEE International conference on Machine Learning and cybernetics, Ningbo, China 9-12 July 2017.
[21] P. Zhang, W. Gao, G. Liu, “Feature selection considering weighted relevancy”, Springer Nature 2018.
[22] S. Jiang , K. S. Chin , G. Qu , K. L. Tsui , “An Integrated Machine Learning Framework for Hospital Readmission Prediction”, Knowledge-Based Systems (2018).
[23] Z. Mao, “Feature subset selection for support vector machines through discriminative function pruning analysis”, IEEE Transactions on Systems, Man and Cybernetics, Vol 34, No. 1, Feb 2004.
[24] S. Gupta, R. R. Sedamkar, “Apply Machine Learning for Healthcare to enhance performance and identify informative features”, IEEE INDIACom; 6th International Conference on “Computing for Sustainable Global Development”, 13th - 15th March, 2019, BVICAM, New Delhi (INDIA).
[25] https://udemy.com/feture-selction-for-machine-learning.

feature selection, heart disease, accuracy, filter, wrapper.