Main Menu

Contra-KD: A Lightweight Transformer Model for Malicious URL Detection with Contrastive Representation and Model Distillation

Citation

Lim, Zheng You and Pang, Ying Han and Jun, Edwin Chan Kah and Ooi, Shih Yin and Ling, Goh Fan (2026) Contra-KD: A Lightweight Transformer Model for Malicious URL Detection with Contrastive Representation and Model Distillation. Future Internet, 18 (3). p. 157. ISSN 1999-5903

Text
futureinternet-18-00157-v2.pdf - Published Version
Restricted to Repository staff only
Download (1MB)

Official URL: https://doi.org/10.3390/fi18030157

Abstract

Infected URLs are always regarded as a serious threat to cybersecurity, serving as path‑ ways to phishing, maliciousness, and other offenses. Although transformer‑based models have demonstrated good performance in malicious URL detection, their high computa‑ tional cost and latency make them impractical for deployment in real‑time or resource‑ constrained systems. Allocated on the basis of knowledge distillation (KD), lightweight models tend to be efficient but are commonly not sufficiently discriminative to distinguish between malicious and benign URLs with non‑cataclysmic lexical overlaps, particularly when dealing with an imbalanced dataset. In order to address these issues, we propose Contra‑KD, a lightweight transformer model that incorporates contrastive learning (CL) and KD. This proposed framework imposes structured embedding matching, allowing the student model to learn more meaningful and generalized depictions. Contra‑KD uses a compact 6‑layer student transformer architecture based on ELECTRA to scale parameters up and can achieve more than 90% computational fidelity with a high accuracy. In this scheme, CL improves the feature of discrimination by semantically clustering similar URLs and separating different URLs. This tendency serves to limit confusion, especially when a common lexical trait is held between two words and/or in the presence of adversarial obfus‑ cation. Through a large‑scale publicly available Kaggle dataset of 651,191 URLs in imbal‑ anced scenarios, the proposed Contra‑KD can achieve 99.05% accuracy, 99.96% ROC‑AUC, and 98.18% MCC which are superior to their counterparts including lightweight models and transformer‑based ones. To summarize, Contra‑KD proposes an efficient transformer architecture that is both small and effective in computation while delivering stable detec‑ tion performance.

Item Type:	Article
Uncontrolled Keywords:	Transformer models, contrastive learning
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK2896-2985 Production of electricity by direct energy conversion
Divisions:	Faculty of Engineering and Technology (FET) Faculty of Information Science and Technology (FIST)
Depositing User:	Ms Rosnani Abd Wahab
Date Deposited:	04 May 2026 02:32
Last Modified:	04 May 2026 02:32
URII:	http://shdl.mmu.edu.my/id/eprint/15834

Downloads

Downloads per month over past year

Edit (login required)