Citation
Rahman, Fatima and Hossain, Sheyum and Tiang, Jun Jiat and Nahid, Abdullah-Al (2025) Diabetes Prediction Using Feature Selection Algorithms and Boosting-Based Machine Learning Classifiers. Diagnostics, 15 (20). p. 2622. ISSN 2075-4418|
Text
diagnostics-15-02622.pdf - Published Version Restricted to Repository staff only Download (2MB) |
Abstract
Background: Diabetes mellitus is a significant primary global health concern that requires accurate diagnosis at an early stage to prevent severe complications. However, accurate prediction remains challenging due to limited, noisy, and imbalanced datasets. This study proposes a novel machine learning framework for improved diabetes prediction, addressing key challenges such as inadequate feature selection, class imbalance, and data preprocessing. Methods: This proposed work systematically evaluates five feature selection algorithms—Recursive Feature Elimination, Grey Wolf Optimizer, Particle Swarm Optimizer, Genetic Algorithm, and Boruta—using cross-validation and SHAP analysis to enhance feature interpretability. Classification is performed using two boosting algorithms: the light gradient boosting machine algorithm (LGBM) and the extreme gradient boosting algorithm (XGBoost). Results: The proposed framework, using the five most important features selected by the Boruta feature selection algorithm, outperformed other configurations with the LightGBM classifier, achieving an accuracy of 85.16%, an F1-score of 85.41%, and a 54.96% reduction in training time. Conclusions: Additionally, we have benchmarked our approach against recent studies and validated its effectiveness on both the Pima Indian Diabetes Dataset and the newly released DiaHealth dataset, demonstrating robust and accurate early diabetes detection across diverse clinical datasets. This approach offers a cost-effective, interpretable, and clinically relevant solution for early diabetes detection by reducing the number of input features, providing transparent feature importance, and achieving high predictive accuracy with efficient model training.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Boosting classifier algorithms, diabetes prediction, feature selection algorithms (FSAs), machine learning, medical diagnostics |
| Subjects: | R Medicine > RC Internal medicine > RC71-78.7 Examination. Diagnosis |
| Divisions: | Faculty of Artificial Intelligence & Engineering (FAIE) |
| Depositing User: | Nor Afiqah Mohd Adnan |
| Date Deposited: | 10 Dec 2025 01:43 |
| Last Modified: | 10 Dec 2025 01:43 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15002 |
Downloads
Downloads per month over past year
Edit (login required) |
