Precision in prediction: tailoring machine learning models for breast cancer missense variants pathogenicity prediction

Citation

Ahmad, Rahaf M. and AlDhaheri, Noura and Mohamad, Mohd Saberi and Ali, Bassam R. (2025) Precision in prediction: tailoring machine learning models for breast cancer missense variants pathogenicity prediction. Briefings in Bioinformatics, 26 (6). ISSN 1467-5463

[img] Text
Precision in prediction_ tailoring machine learning models for breast cancer missense variants pathogenicity prediction.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

Accurate classification of genetic variants is critical for precision medicine, particularly hereditary diseases such as breast cancer. However, widely used tools like MutPred and Combined Annotation Dependent Depletion (CADD) offer genome-wide pathogenicity predictions that often overlook disease-specific variant behavior, limiting their clinical utility. This study addresses that gap by training and benchmarking nine machine learning (ML) models-including ensemble and baseline classifiers-on a breast cancer genespecific dataset rich in conservation scores, functional annotations, and allele frequency features. Among all models, the Extra Trees model achieved the highest performance, with an accuracy of 0.999 and a 95% confidence interval of (0.998–1.000). recursive feature elimination identified the most informative genomic features, enhancing model efficiency. To ensure clinical transparency, we applied interpretability techniques including Local Interpretable Model-Agnostic Explanations and permutation feature importance, which highlighted the key drivers of each prediction. The calibration curve further confirmed the reliability of predicted probabilities, supporting their potential use in clinical decision-making. On an independent ClinGen dataset, Extra Trees achieved 99.1% accuracy and outperformed widely used predictors confirming its robustness and clinical applicability. This is the first comprehensive benchmarking study to apply ML models specifically to breast cancer-related missense variants using disease-gene-specific training data and integrated interpretability. Our results show that disease-specific ML approaches outperform general predictors, offering improved reliability, transparency, and relevance to clinical genomics. By bridging the gap between broad genome-wide tools and tailored clinical prediction, this study lays the foundation for implementing ML-driven pathogenicity prediction in breast cancer diagnostics and precision medicine, with potential expansion to other disease contexts.

Item Type: Article
Uncontrolled Keywords: Breast cancer, pathogenicity prediction, machine learning
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
Divisions: Faculty of Engineering and Technology (FET)
Depositing User: Ms Rosnani Abd Wahab
Date Deposited: 10 Dec 2025 09:30
Last Modified: 10 Dec 2025 09:30
URII: http://shdl.mmu.edu.my/id/eprint/15059

Downloads

Downloads per month over past year

View ItemEdit (login required)