Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection

Citation

Law, Theng Jia and Ting, Choo-Yee and Ng, Hu and Goh, Hui-Ngo and Quek, Albert Ensemble-SMOTE: Mitigating Class Imbalance in Graduate on Time Detection. Journal of Informatics and Web Engineering, 3 (2). ISSN 2821-370X

[img] Text
1076-Article Text-8551-1-10-20240607.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

In education, detecting students graduating on time is difficult due to high data complexity. Researchers have employed various approaches in identifying on-time graduation with Machine Learning, but it remains a challenging task due to the class imbalance in the dataset. This study has aimed to (i) compare various class imbalance treatment methods with different sampling ratios, (ii) propose an ensemble class imbalance treatment method in mitigating the problem of class imbalance, and (iii) develop and evaluate predictive models in identifying the likelihood of students graduating on time during their studies in university. The dataset is collected from 4007 graduates of a university from year 2021 and 2022 with 41 variables. After feature selection, various class imbalance treatment methods were compared with different sampling ratios ranging from 50% to 90%. Moreover, Ensemble-SMOTE is proposed to aggregate the dataset generated by Synthetic Minority Oversampling Technique variants in mitigating the problem of class imbalance effectively. The dataset generated by class imbalance treatment methods were used as the input of the predictive models in detecting on-time graduation. The predictive models were evaluated based on accuracy, precision, recall, F0.5-score, F1-score, F2-score, Area under the Curve, and Area Under the Precision-Recall Curve. Based on the findings, Logistic Regression with Ensemble-SMOTE outperformed other predictive models, and class imbalance treatment methods by achieving the highest average accuracy (87.24), recall (92.50%), F1-score (91.30%), and F2-score (92.02%) from 6th until 10th trimester. To assess the effectiveness of class imbalance treatment methods, Friedman test is performed to determine on significant difference between the models after applying Shapiro-Wilk test in normality test. Consequently, Ensemble-SMOTE is ranked as the top-performers by achieving the lowest value in the average rank based on the performance metrics. Additional research could incorporate and examine more complicated approaches in mitigating class imbalance when the dataset is highly imbalanced.

Item Type: Article
Uncontrolled Keywords: Graduate on Time, In-University, Class Imbalance, Artificial Intelligence, Machine Learning
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 11 Jul 2025 01:19
Last Modified: 11 Jul 2025 01:19
URII: http://shdl.mmu.edu.my/id/eprint/14250

Downloads

Downloads per month over past year

View ItemEdit (login required)