Diabetes Risk Prediction using Shapley Additive Explanations for Feature Engineering

Citation

Chituru, Chinwe Miracle and Ho, Sin Ban and Chai, Ian (2025) Diabetes Risk Prediction using Shapley Additive Explanations for Feature Engineering. Journal of Informatics and Web Engineering, 4 (2). pp. 18-35. ISSN 2821-370X

[img] Text
1387-Article Text-15184-8-10-20250513.pdf - Published Version
Restricted to Repository staff only

Download (5MB)

Abstract

Diabetes is prevalent globally, expected to increase in the next few years. This includes people with different types of diabetes including type 1 diabetes and type 2 diabetes. There are several causes for the increase: dietary decisions and lack of exercise as the main ones. This global health challenge calls for effective prediction and early management of the disease. This research focuses on the decision tree algorithm utilization to predict the risk of diabetes and model interpretability with the integration of SHapley Additive exPlanations (SHAP) for feature engineering. Random forest and gradient boosting models were developed to identify the risk factors and compare the prediction with the decision tree model. The performance of these classifiers was evaluated using the metrics for accuracy, f1-score, precision, and recall. Understanding the features that drive predictions can enhance clinical decision-making as much as predictive accuracy. With the use of a comprehensive dataset having 520 instances with 17 features including the target output, the proposed decision tree model had an accuracy of 97%. The decision tree model’s categorical variables enable straightforward data visualization. The SHAP tool was applied to interpret the model’s prediction after developing the model. This is crucial for healthcare practitioners as it provides specific health metrics to identify high-risk diabetic patients. Preliminary results indicate that a combination of polyuria, polydipsia, and age are predictors of diabetes risk. This study highlights the benefits that the integration of SHAP and decision trees algorithm provides predictive capability and transparent model interpretability. It also contributes to the growing body of literature on machine learning in the healthcare industry. The results advocate for the application of this methodology in clinical settings for prediction fostering trust between the approach and practitioners and patients alike.

Item Type: Article
Uncontrolled Keywords: Diabetes Risk Prediction, Decision Tree Algorithm, Additive Explanations, Feature Engineering, Data Visualization
Subjects: Q Science > Q Science (General)
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 25 Jun 2025 07:30
Last Modified: 25 Jun 2025 07:30
URII: http://shdl.mmu.edu.my/id/eprint/14007

Downloads

Downloads per month over past year

View ItemEdit (login required)