Diabetes Classification Using Cascaded Data Mining Technique

Joseph N. Mamman; Muhammad B. Abdullahi; Abiodun M.Aibinu; Ibrahim M. Abdullahi

doi:10.14445/22312803/IJCTT-V22P111

Research Article | Open Access | Download PDF

Volume 22 | Number 1 | Year 2015 | Article Id. IJCTT-V22P111 | DOI : https://doi.org/10.14445/22312803/IJCTT-V22P111

Diabetes Classification Using Cascaded Data Mining Technique

Joseph N. Mamman, Muhammad B. Abdullahi, Abiodun M.Aibinu, Ibrahim M. Abdullahi

Citation :

Joseph N. Mamman, Muhammad B. Abdullahi, Abiodun M.Aibinu, Ibrahim M. Abdullahi, "Diabetes Classification Using Cascaded Data Mining Technique," International Journal of Computer Trends and Technology (IJCTT), vol. 22, no. 1, pp. 53-63, 2015. Crossref, https://doi.org/10.14445/22312803/IJCTT-V22P111

Abstract

Clustering plays a major role in data mining for: building models from an input data set; predicting future data trends for further decision making; simulating and analysing model; and diagnosing of healthcare diseases. Currently, in diagnosis of healthcare diseases such as diabetes, the initial knowledge of the clustered data is required in the use of Artificial intelligence (AI) technique as data pre-processing and classification technique. However, the inability to have such a prior knowledge had led to less classification accuracy. In this work, a cascade of K-Means clustering algorithm and Artificial Neural Network (ANN) was proposed for clustering of diabetes dataset. The proposed model was implemented in two stages. In the first stage, a K-Means clustering was used to pre-process the dataset after the initial filtering operation. In the second stage, the ANN was used to classify the result obtained from the preprocessed dataset. The proposed cascaded model was applied on Pima Indian diabetes dataset (PIDD) obtained from one of the public repository. Experimental results shows that accuracy of 99.2% was obtained from the K-Means-ANN model. Further analysis also revealed that the cascade of K-means-ANN model outperformed the cascade of ANN-K-means model, thus establishing that the two cascaded models are not commutative.

Keywords

Data mining, diabetes disease, Pima Indian Diabetes Dataset, ANN, K-means clustering, Pre-Processed Data, ClassificationPut your keywords here, keywords are separated by comma.

References

[1] Aibinu, A. M., Salami, M. J. E. and Shafie, A. A. “Application of modelling techniques to diabetes diagnosis”. In IEEE conference on biomedical engineering and sciences, Malaysia. 2010.
[2] Aibinu, A. M., Salami, M. J. E., and Shafie, A. A. “A novel signal diagnosis technique using pseudo complex-valued autoregressive technique”. Expert Systems with Applications; 2011; 38(8), 9063- 9069.
[3] Arora, R., and Suman, S. (2012). “Comparative Analysis of Classification Algorithms on Different Datasets using WEKA”. International Journal of Computer Applications, 54(13), 21-25.
[4] Giveki, Davar, Hamid Salimi, GholamRezaBahmanyar, and YounesKhademian. “Automatic detection of diabetes diagnosis using feature weighted support vector machines based on mutual information and modified cuckoo search”. arXiv preprint arXiv: 2012;1201.2173
[5] Gupta, S., Kumar, D., and Sharma, A. “Performance analysis of various data mining classification techniques on healthcare data”. International journal of computer science and Information Technology (IJCSIT); 2011: 3(4).
[6] Kabakchieva, Dorina. "Predicting student performance by using data mining methods for classification. Cybernetics and Information Technologies 13, no. 1; 2013; p. 61-72.
[7] Karegowda, Asha Gowda, M. A. Jayaram, and A. S. Manjunath. “Cascading K-means clustering and K-nearest neighbor classifier for categorization of diabetic patients”. International Journal of Engineering and Advanced Technonlogy 1; 2012; p. 147-151.
[8] Karegowda, Asha Gowda, V. Punya, M. A. Jayaram, and A. S. Manjunath. "Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4. 5." International Journal of Computer Applications; 2012; 45
[9] Kaushik H. Raviya and BirenGajja,“Performance Evaluation of Different Data Mining Classification Algorithm Using WEKA tool”. Indian Journal of Research. (Volume: 2, Issue: 1, January 2013 ISSN - 2250-1991.
[10] Khyati K. Gandhi and NileshB.Prajapati. “Diabetes prediction using feature selection and classification”. International Journal of Advance Engineering and Research Development (IJAERD) Volume 1, Issue 5, (2014). e-ISSN: 2348 - 4470 , print- ISSN:2348-6406.
[11] Koklu, M., and Unal, Y. “Analysis of a Population of Diabetic Patients Databases with Classifiers”. human resources, 1, 2; 2013.
[12] KratiSaxena and Shefali Singh. “Diabetes Mellitus Forecast Using Artificial Intelligence Techniques”.International Conference of Advance Research and Innovation (ICARI-2014) 544 ICARI, ISBN 978-93-5156-328-0
[13] Neelamegam S. and Ramaraj E. “Classification algorithm in Data mining”: An Overview. International Journal of P2P Network Trends and Technology (IJPTT) 4; 8; 2013; ISSN: 2249-2615.
[14] Pal, Jiban K. “Usefulness and applications of data mining in extracting information from different perspectives”. 2011;
[15] Pandeeswari, L., &Rajeswari, K. “K-Means Clustering and Naive Bayes Classifier For Categorization Of Diabetes Patients”. International Journal of Innovative Science, Engineering & Technology (IJISET), Vol. 2 Issue 1, 2015.
[16] Patil, Tina R., and Mrs SS Sherekar. “Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification”. International Journal of Computer Science and Applications 6; 2; 2013.
[17] Pham, Huy Nguyen Anh, and EvangelosTriantaphyllou. “Prediction of diabetes by employing a new data mining approach which balances fitting and generalization”. Computer and Information Science. Springer Berlin Heidelberg; 2008; p. 11-26.
[18] Polat, Kemal, and SalihGüne?. “An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease”. Digital Signal Processing 17.4; 2007: p. 702-710.
[19] Raschka, S. “An Overview of General Performance Metrics of Binary Classifier Systems”. arXiv preprint arXiv:2014; 1410.5330.
[20] Ravinder Reddy R., Padmalatha E., Ramadevi Y. and K.V.N Sunitha. “Performance Analysis of Classifiers for Intrusive Data and Rough Sets Reducts”. IJCSNS International Journal of Computer Science and Network Security, VOL.14 No.8. 2014
[21] Shazmeen, S. F., Baig, M. M. A., and Pawar, M. R. “Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis”. Journal of Computer Engineering, 10(6), (2013) 01-06.
[22] Tan Pang-Ning, Steinbach, M., Vipin Kumar. “Introduction to Data Mining”. Pearson Education, New Delhi, ISBN: 978-81-317- 1472-0, 3rd Edition, 2009.
[23] Ray, Kisor, Santanu Ghosh, Mridul Das, and Bhaswati Ray. "Design & Implementation Approach for Error Free Clinical Data Repository for the Medical Practitioners." arXiv preprint arXiv:1503.08636 (2015).