A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification

Citation

Liaw, Lawrence Chuin Ming and Tan, Shing Chiang and Goh, Pey Yun and Lim, Chee Peng (2025) A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification. Information Sciences, 686. p. 121193. ISSN 0020-0255

[img] Text
1-s2.0-S0020025524011071-main.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

The synthetic minority over-sampling technique (SMOTE) is a well-known over-sampling method for handling imbalanced data. However, SMOTE and many of its variants are susceptible to several limitations, particularly ignoring intra-class imbalance and small disjuncts during over-sampling. To address the shortcomings of SMOTE, this paper presents a novel histogram SMOTE (H-SMOTE)-based sampling algorithm integrated with a fuzzy ARTMAP (FAM) network. The proposed H-SMOTE-FAM model operates as follows: first, the minority-class samples are organized into histogram bins, and a target sample number is computed for each available bin. Synthetic samples in each available bin are generated using SMOTE according to the target number. As all available bins contain the same numbers of minority-class samples (including the original and synthetic samples), the limitations of SMOTE are overcome. Next, the issue of increasing data size after SMOTE is solved by FAM through incremental learning. The classification performances of H-SMOTE-FAM are rigorously compared with those of SMOTE variants and generative adversarial network (GAN)-based over-sampling methods using two groups of imbalanced data sets. A series of comprehensive performance evaluations with statistical hypothesis testing demonstrates that H-SMOTE-FAM outperforms its baseline algorithms, several state-of-the-art SMOTE variants, and GANs in performing imbalanced data classification tasks.

Item Type: Article
Uncontrolled Keywords: Imbalanced data, Organized data-level solution, Histogram, SMOTE, Fuzzy, ARTMAP
Subjects: Q Science > QA Mathematics > QA150-272.5 Algebra
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 08 Oct 2024 00:40
Last Modified: 08 Oct 2024 00:40
URII: http://shdl.mmu.edu.my/id/eprint/13057

Downloads

Downloads per month over past year

View ItemEdit (login required)