Citation
Kustiawan, Yanche Ari and Ghauth, Khairil Imran (2025) Evaluating the Impact of Feature Engineering in Phishing URL Detection: A Comparative Study of URL, HTML, and Derived Features. IEEE Access. p. 1. ISSN 2169-3536![]() |
Text
9.pdf - Published Version Restricted to Repository staff only Download (1MB) |
Abstract
Phishing attacks have evolved into sophisticated threats, making effective cybersecurity detection strategies essential. While many studies focus on either URL or HTML features, limited work has explored the comparative impact of engineered feature sets across different machine learning models. This study aims to bridge that empirical gap by evaluating the effectiveness of URL-based, HTML-based, and derived features, individually and in combination, on phishing URL detection. The proposed approach utilizes the PhishOFE dataset of 101,063 phishing and legitimate URLs. Features are organized into four sets: (1) URL only, (2) HTML only, (3) URL + HTML, and (4) URL + HTML + derived features. Ten machine learning models are employed, including Random Forest, k-Nearest Neighbors, Logistic Regression, Support Vector Machine, Naive Bayes, and advanced ensemble methods such as LightGBM, XGBoost, and CatBoost. Performance is assessed using accuracy, precision, recall, and F1-score, while permutation importance is used to evaluate feature significance. Experimental results demonstrate that ensemble models outperform traditional classifiers, with CatBoost achieving the highest accuracy of 99.45% using the complete feature set. Moreover, URL features like URLLength and NoOfSubDomain consistently rank high in importance, while derived features such as SuspiciousCharRatio and URLComplexityScore notably enhance detection performance in specific models.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Phishing URL detection, machine learning, feature engineering, URL features, HTML features, derived features, ensemble learning. |
Subjects: | Q Science > QA Mathematics > QA71-90 Instruments and machines |
Divisions: | Faculty of Computing and Informatics (FCI) |
Depositing User: | Ms Suzilawati Abu Samah |
Date Deposited: | 30 Jun 2025 03:14 |
Last Modified: | 30 Jun 2025 03:14 |
URII: | http://shdl.mmu.edu.my/id/eprint/14151 |
Downloads
Downloads per month over past year
![]() |