A Comparative Analysis of Machine Learning Algorithms for Spam Email Classification with Explainable AI Insights

Citation

Hoque Sawon, Md Saimul and Chakraborty, Shuvo and Rabbi, Riadul Islam and Liew, Tze Hui and Tushe, Ummay Ayman and Haque Tusher, Ekramul (2025) A Comparative Analysis of Machine Learning Algorithms for Spam Email Classification with Explainable AI Insights. In: 8th 2025 International Conference on New Media Studies, CONMEDIA 2025, 14 October 2025 - 17 October 2025, Malacca, Malaysia.

[img] Text
IEEE Xplore Full-Text PDF_2.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

pam emails, or emails that you didn’t ask for, are a big problem for digital communication systems because they fill up inboxes, waste time, and put users at risk for phishing and viruses. To keep communication safe and productive, especially for people and businesses that rely heavily on email, spam must be found and filtered quickly. Taking care of this problem can boost productivity, keep private information safe, and make it easier to filter emails by hand. This paper performs a comparative analysis of six machine learning methods for spam email detection: Support Vector Machine (SVM), Random Forest (RF), Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), and K-Nearest Neighbors (K-NN). We utilize a dataset of 20,000 tagged emails from Kaggle, with 80% of them for training and 20% for testing. We compare how well the models work by looking at their accuracy, precision, recall, and F1-score. The Support Vector Machine (SVM) model did better than the others, with an accuracy of 97.18%, a precision of 95.87%, a recall of 98.26%, and an F1-score of 97.05%. To make it easier to understand, Explainable AI (XAI) methods like SHAP values and LIME were used. These methods showed that the existence of particular terms and phrases that are often linked to spam was very important in making classification decisions. This made the model’s predictions more clear. The results show that SVM is better at spotting spam and that using XAI approaches can help make models that are more dependable and easier to understand for real-world use.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Email spam classification, machine learning
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Engineering and Technology (FET)
Faculty of Information Science and Technology (FIST)
Depositing User: Ms Rosnani Abd Wahab
Date Deposited: 20 Apr 2026 01:49
Last Modified: 20 Apr 2026 01:49
URII: http://shdl.mmu.edu.my/id/eprint/15745

Downloads

Downloads per month over past year

View ItemEdit (login required)