Comparison of Label Encoding and Evidence Counting for Malware Classification


Low, Min Xuan and Yap, Timothy Tzen Vun and Soo, Wooi King and Ng, Hu and Goh, Vik Tor and Chin, Ji Jian and Kuek, Thiam Yong (2022) Comparison of Label Encoding and Evidence Counting for Malware Classification. Journal of System and Management Sciences, 12 (6). pp. 17-30. ISSN 1816-6075, 1818-0523

[img] Text
Vol.12.No.06.02.pdf - Published Version
Restricted to Repository staff only

Download (289kB)


Malware is a type of software that is aimed to attack or harm a computer system without the owner's knowledge or permission. However, the traditional methods such as signature-based and behaviour-based malware analysis techniques heavily depend on manual inspection and detection by analysts, and it is nearly impossible, although it is very reliable, given the cons when dealing with techniques like polymorphism, metamorphism and obfuscation. Thus, this project aims to develop models to detect malware in operation. Two different techniques are considered for feature engineering, namely label encoding and evidence counting. Feature selection is applied to isolate less significant features. Machine learning models such as Random Forest (RF), Decision Tree (DT), K-Nearest Neighbour (K-NN) and Support Vector Machine (SVM) Classifiers, Multilayer Perceptron (MLP) and Long-Short Term Memory (LSTM) are applied for comparison of the efficacy of the two feature engineering approaches. The performances of the models are evaluated through accuracy, precision, recall, F1-score, and loss. Tree-based models such as RF and DT seem to be more suitable for mining patterns from labelled data, as their performances are generally better in the label encoding approach. SVM and K-NN tend to not cope very well with labelled data in this study. Deep learning approaches in this study has shown potential in malware classification, with further improvements required in building a robust solution against solving complex real-world malware detention and classification.

Item Type: Article
Uncontrolled Keywords: Malware detection, machine learning, deep learning, classification
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Computing and Informatics (FCI)
Faculty of Engineering (FOE)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 22 Mar 2023 01:50
Last Modified: 22 Mar 2023 01:50


Downloads per month over past year

View ItemEdit (login required)