Feature Engineering for Phishing Website Detection Using Machine Learning: A Systematic Review

Citation

Ari Kustiawan, Yanche and Ghauth, Khairil Imran (2025) Feature Engineering for Phishing Website Detection Using Machine Learning: A Systematic Review. IEEE Access, 13. pp. 192080-192104. ISSN 2169-3536

[img] Text
Feature_Engineering_for_Phishing_Website_Detection_Using_Machine_Learning_A_Systematic_Review.pdf - Published Version
Restricted to Repository staff only

Download (16MB)

Abstract

Phishing website detection has been widely explored, yet most existing surveys focus on detection algorithms or general cybersecurity methods, offering only a brief treatment of feature engineering. This systematic literature review addresses this gap by analyzing recent studies on feature engineering for phishing URL detection using machine learning. The review categorizes feature types, such as URL/lexical, HTML/content, domain, behavioral, and emerging derived and hybrid features, and examines the application of feature selection and dimensionality reduction techniques. The findings reveal a persistent reliance on URL/lexical features for their simplicity and broad applicability, despite growing interest in integrating content-based and composite approaches for improved robustness. Key challenges include adapting to evolving phishing tactics, handling dataset limitations, mitigating high dimensionality, and balancing computational efficiency with detection accuracy. The review also identifies opportunities that have been underexplored in prior work, such as leveraging explainable AI methods (e.g., SHAP, LIME) to validate and optimize feature sets, expanding cross-lingual and adversarial-resilient features, and developing lightweight solutions for real-time or resource-constrained environments. This study offers a targeted roadmap for advancing phishing detection research by consolidating and reviewing the role of feature engineering. It highlights its pivotal role in creating more accurate, adaptive, and efficient detection systems.

Item Type: Article
Uncontrolled Keywords: Cybersecurity, Feature engineering, machine learning, phishing website detection, systematic review
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Nor Afiqah Mohd Adnan
Date Deposited: 10 Dec 2025 02:47
Last Modified: 13 Dec 2025 03:31
URII: http://shdl.mmu.edu.my/id/eprint/15012

Downloads

Downloads per month over past year

View ItemEdit (login required)