A Metaheuristic-Optimized Feature Selection for Early-Stage Diabetes Prediction With SHAP-Guided Insight into Influential Attributes

Citation

Rochy, Esmay Azam and Ferdaus, Jannatul and Biswas, Uzzal and Tiang, Jun Jiat and Nahid, Abdullah-Al (2026) A Metaheuristic-Optimized Feature Selection for Early-Stage Diabetes Prediction With SHAP-Guided Insight into Influential Attributes. Journal of Computer Science, 22 (2). pp. 540-551. ISSN 1549-3636

[img] Text
jcssp.2026.540.551.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

Diabetes is a metabolic disorder that causes elevated blood glucose. This long-term health condition can lead to cardiovascular diseases, stroke, kidney failure, visual impairment, neuropathy, and even death in critical cases. So, a Computer-Aided Diagnostic (CAD) system is necessary to diagnose diabetes automatically. A clinician can utilize a machine learning-based CAD system that automatically diagnoses many people. This paper will use a Random Forest (RF) classifier for Machine Learning (ML) classification to identify if any individual is diabetic or non-diabetic. In order to increase the accuracy and robustness of the model, the Zebra Optimization Algorithm (ZOA) and the proposed Nomad Zebra Optimization Algorithm (NZOA) are used to identify the most optimal feature sets based on RF subset selection and RFE (Recursive Feature Elimination) technique. Smoking and Age have been identified as the most influential features with a prediction accuracy of 79.86%with a precision of 75.51%, recall of 88.33%, and F1-score of 81.42% using the proposed NZOA. Finally, to further increase the model interpretability and assist physicians in making decisions without any irrationality, SHAP (Shapley Additive Explanations) is used to explain the outputs of the models based on game theory and optimal credit allocation techniques. It also identifies that smoking has the highest impact on our model.

Item Type: Article
Uncontrolled Keywords: Machine Learning, Diabetes Prediction, Diagnosis, Features, Explainable AI, Optimization, Public Health
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines
Divisions: Faculty of Artificial Intelligence & Engineering (FAIE)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 04 Jun 2026 03:25
Last Modified: 04 Jun 2026 03:25
URII: http://shdl.mmu.edu.my/id/eprint/15916

Downloads

Downloads per month over past year

View ItemEdit (login required)