Addressing label noise in leukemia image classification using small loss approach and pLOF with weighted-average ensemble

Citation

Aziz, Md. Tarek and Mahmud, S.M. Hasan and Goh, Michael Kah Ong and Nandi, Dip (2024) Addressing label noise in leukemia image classification using small loss approach and pLOF with weighted-average ensemble. Egyptian Informatics Journal, 26. p. 100479. ISSN 1110-8665

[img] Text
Addressing label noise in leukemia image classification using small loss approach and pLOF with weighted-average ensemble.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

Machine learning (ML) and deep learning (DL) models have been extensively explored for the early diagnosis of various cancer diseases, including Leukemia, with many of them achieving significant performance improvements comparable to those of human experts. However, challenges like limited image data, inaccurate annotations, and prediction reliability still hinder their broad implementation to establish a trustworthy computer-aided diagnosis (CAD) system. This paper introduces a novel weighted-average ensemble model for classifying Acute Lymphoblastic Leukemia, along with a reliable Computer-Aided Diagnosis (CAD) system that combines the strengths of both ML and DL approaches. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques to expand the training data. Second, a modified VGG-19 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, A small-loss approach and probabilistic local outlier factor (pLOF) have been developed on the extracted features to address the label noise issue. Fourth, we proposed an weighted-average ensemble model based on the top five models as base learners, with weights calculated based on their model uncertainty to ensure reliable predictions. Fifth, we calculated Shapley values based on cooperative game theory and performed feature selection with different feature combinations to determine the optimal number of features using SHAP. Finally, we integrate these strategies to develop an interpretable CAD system. This system not only predicts the disease but also generates Grad-CAM images to visualize potential affected areas, enhancing both clarity and diagnostic insight. All of our code is provided in the following repository:

Item Type: Article
Uncontrolled Keywords: Machine learning
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 28 May 2024 07:12
Last Modified: 28 May 2024 07:12
URII: http://shdl.mmu.edu.my/id/eprint/12446

Downloads

Downloads per month over past year

View ItemEdit (login required)