CSE-DT Features Selection Technique for Diabetes Classification

Matthew T. Ogedengbe, Charity O. Egbunu


Diabetes has become one of the world deadliest disease. It is a sickness which occurs as a result of increase in blood sugar level in the body. Most people living with it encounter various complications in their body organs if it remain undetected and untreated at the early stage. Most literatures considered all features of a diabetes dataset as risk factors in diagnosing diabetes and this has resulted to low classification accuracy and longer execution time since all the features in the dataset are involved in the classification process. Selecting the most relevant features as the risk factors improves the performance of classifiers in term of classification accuracy and other performance measures. This paper presents feature selection technique called Classifier Subset Evaluator (CSE) which selects most relevant risk factors for the prevalence of diabetes in the body. The selected features (risk factors) were passed to J48 decision tree (DT) classifier for training and testing, and the DT classified all the instances of the dataset based on these selected features. The CSE and DT were hybridized as a proposed Classifier Subset Evaluator Decision Tree (CSE-DT). The CSE-DT was experimented on Pima Indian Diabetes dataset (PIDD) acquired from the UCI data repository and implemented on Waikato Experiment for Knowledge Analysis (WEKA). The CSE-DT was compared with Naïve-Bayes, Support vector machine (SVM) and Decision Tree for the evaluation measure in terms of F-Measure, Precision, ROC, Recall and Accuracy. The results show that the CSE-DT attained a better classification accuracy value of 81.64% among others.


Accuracy; Classifier Subset Evaluator; Diabetes; Decision tree; Naïve-Bayes; Support vector machine.

Article Metrics

Abstract view : 305 times
PDF - 154 times

Full Text:



S. Hina, A. Shaikh and A. S. Sattar, Analyzing diabetes datasets using data mining, Journal of Basic and Applied Sciences, 13, 2017, 466-471.

H. Wu, S. Yang, Z. Huang, J. He and X. Wang, Type 2 diabetes mellitus prediction model based on data mining, Informatics in Medicine Unlocked, 10, 2018, 100-107.

M. Shuja, S. Mittal and M. Zaman, Effective prediction of type ii diabetes mellitus using data mining classifiers and SMOTE, Advances in Computing and Intelligent Systems, Springer: 2020, 195-211.

G. Swapna, R. Vinayakumar, and K. P. Soman, Diabetes detection using deep learning algorithms, ICT Express, 4(4), 2018, 243-246.

N. Sharma and A. Singh, Diabetes detection and prediction using machine learning/IoT: A survey, in Advanced Informatics for Computing Research, A. Luhach, D. Singh, P. A. Hsiung, K. Hawari, P. Lingras and P. Singh (eds). Communications in Computer and Information Science, 995, 2019, 471-479.

L. Tapak, M. Hossein, H. Omid, and P. Jalal, Real-Data comparison of data mining methods in prediction of diabetes in Iran, Healthcare informatics Research, 19, 2013, 177-185.

D. Sisodia, and D. Singh, Prediction of diabetes using classification algorithms, Procedia Computer Science, 132, 2018, 1578-1585.

M. P. Bamnote, Design of classifier for detection of diabetes mellitus using genetic programming, Advances in Intelligent Systems and Computing, 1, 2014, 763-770.

N. Nai-Arun, and R. Moungmai, Comparison of Classifiers for the Risk of Diabetes Prediction, Procedia Computer Science, 69, 2015, 132-142.

B. M. Patil, Hybrid prediction model for type-2 diabetic patients, Expert Systems with Applications, 37, 2010, 8102-8108.

A. Ahmad, A. Mustapha, E. D. Zahadi, N. Masah and N. Y. Yahaya, Comparison between neural networks against decision tree in improving prediction accuracy for diabetes mellitus, in Digital Information Processing and Communications, V. Snasel, J. Platos and E. El-Qawasmeh (eds), 188, 2011, 537-45.

A. Marcano-Cedeño, J. Torres and D. Andina, A prediction model to diabetes using artificial metaplasticity, in New Challenges on Bioinspired Applications, J. M. Ferrández, J. R. Álvarez Sánchez, F. de la Paz and F. J. Toledo (eds), 6687, 2011, 418-425.

K. Nandini and T. Deepa, A study on disease prediction using data mining technique, Our Heritage, 68(19), 2020, 100-107.

T. Anand, R. Pal and S. K. Dubey, Cluster analysis for diabetic retinopathy prediction using data mining techniques, International Journal of Business Information Systems, 31(3), 2019, 372-390.

G. Khurana and A. Kumar, Improving accuracy for diabetes mellitus prediction using data pre-processing and various new learning models, International Journal of Scientific Research in Science and Technology, 6(2), 2019, 502-515.

M. M. F. Islam, R. Ferdousi, S. Rahman and S. Y. Bushra, Likelihood prediction of diabetes at early stage using data mining techniques, in Computer Vision and Machine Intelligence in Medical Image Analysis, M. Gupta, D. Konar, S. Bhattacharyya and S. Biswas (eds), 992, 2020, 113-125.

C. Fiarni, E. M. Sipayung and S. Maemunah, Analysis and prediction of diabetes complication disease using data mining algorithm, Procedia Computer Science, 161, 2019, 449-457.

S. Saradha and P. Sujatha, Prediction of gestational diabetes diagnosis using SVM and J48 classifier model, International Journal of Engineering & Technology, 7(2.21), 2018, 323-326.

S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, Informatica, 31, 2007, 249-268.

D. Sisodia and D. S. Sisodia, Prediction of diabetes using classification algorithms, Procedia Computer Science, 132, 2018, 1578-1585.