Citation
Das, Prosenjit and Sarker, Proshenjit and Tiang, Jun Jiat and Nahid, Abdullah-Al (2025) Breast Cancer Prediction Using Rotation Forest Algorithm Along with Finding the Influential Causes. Bioengineering, 12 (10). p. 1020. ISSN 2306-5354|
Text
14791.pdf - Published Version Restricted to Repository staff only Download (742kB) |
Abstract
Breast cancer is a widespread disease involving abnormal (uncontrolled) growth of breast tissue cells along with the formation of a tumor and metastasis. Breast cancer cases occur mostly among women. Early detection and regular screening have significantly improved survival rates. This research classifies breast cancer and non-breast cancer cases using machine learning algorithms based on the Breast Cancer Coimbra dataset by optimizing the classifier performance and feature selection methodology. In addition, this research identifies the influential features responsible for BC classification by using diverse counterfactual explanations. The Rotation Forest classifier algorithm is used to classify breast cancer and non-breast cancer cases. The hyperparameters of this algorithm are optimized using the Optuna optimizer. Three wrapper-based feature selection techniques (Sequential Forward Selection, Sequential Backward Selection, and Exhaustive Feature Selection) are used to select the most relevant features. An ensemble environment is also created using the best feature subsets of these methods, incorporating both soft and hard voting strategies. Experimental results show that the hard voting strategy achieves an accuracy of 85.71%, F1-score of 83.87%, precision of 92.85%, and recall of 76.47%. In contrast, the soft voting strategy obtains an accuracy of 80.00%, F1-score of 77.42%, precision of 85.71%, and recall of 70.59%. These findings demonstrate that hard voting achieves noticeably better performance. The misclassification outcomes of both strategies are explored using Diverse Counterfactual Explanations, revealing that BMI and Glucose values are most influential in predicting correct classes, whereas the HOMA, Adiponectin, and Resistin values have little influence.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Breast cancer, Optuna, rotation forest, wrapper method, ensemble voting classifier, counterfactual analysis |
| Subjects: | R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer) |
| Divisions: | Faculty of Artificial Intelligence & Engineering (FAIE) |
| Depositing User: | Nurin Syazwani Azmi |
| Date Deposited: | 10 Nov 2025 00:48 |
| Last Modified: | 10 Nov 2025 00:48 |
| URII: | http://shdl.mmu.edu.my/id/eprint/14791 |
Downloads
Downloads per month over past year
Edit (login required) |
