Diabetes Classification Using Cascaded Data Mining Technique

Volume-22 Number-1
Year of Publication : 2015
Authors : Joseph N. Mamman, Muhammad B. Abdullahi, Abiodun M.Aibinu, Ibrahim M. Abdullahi
DOI :  10.14445/22312803/IJCTT-V22P111


Clustering plays a major role in data mining for: building models from an input data set; predicting future data trends for further decision making; simulating and analysing model; and diagnosing of healthcare diseases. Currently, in diagnosis of healthcare diseases such as diabetes, the initial knowledge of the clustered data is required in the use of Artificial intelligence (AI) technique as data pre-processing and classification technique. However, the inability to have such a prior knowledge had led to less classification accuracy. In this work, a cascade of K-Means clustering algorithm and Artificial Neural Network (ANN) was proposed for clustering of diabetes dataset. The proposed model was implemented in two stages. In the first stage, a K-Means clustering was used to pre-process the dataset after the initial filtering operation. In the second stage, the ANN was used to classify the result obtained from the preprocessed dataset. The proposed cascaded model was applied on Pima Indian diabetes dataset (PIDD) obtained from one of the public repository. Experimental results shows that accuracy of 99.2% was obtained from the K-Means-ANN model. Further analysis also revealed that the cascade of K-means-ANN model outperformed the cascade of ANN-K-means model, thus establishing that the two cascaded models are not commutative.

Data mining, diabetes disease, Pima Indian Diabetes Dataset, ANN, K-means clustering, Pre-Processed Data, Classification