Feature Selection for Financial Data Classification Using Random Forest, Boruta, and Recursive Feature Elimination

Citation

Dewi, Christine and Andika, Rio Arya and Haryani, Endang and Riantama, Dalianus and Sajid, Ahthasham and Alam, Muhammad Mansoor and Mohd Su'ud, Mazliham (2025) Feature Selection for Financial Data Classification Using Random Forest, Boruta, and Recursive Feature Elimination. Ingénierie des systèmes d information, 30 (08). pp. 2165-2173. ISSN 1633-1311

[img] Text
isi_30.08_22.pdf - Published Version
Restricted to Repository staff only

Download (1MB)

Abstract

The increasing availability of financial data has accelerated the use of machine learning for classification tasks in finance. However, financial datasets are often high-dimensional and noisy, which can degrade model performance and increase computational costs. Feature selection serves as a critical pre-processing step to reduce dimensionality and improve efficiency. This study compares three feature selection methods—Random Forest, Boruta, and Recursive Feature Elimination (RFE)—in the context of financial data classification. The analysis is conducted using three publicly available datasets: Adult Income, Marketing Campaign, and Taiwanese Bankruptcy. A variety of machine learning classifiers are applied to evaluate the impact of feature selection on classification accuracy. Experimental results show that Random Forest Classifier (RFC), particularly with hyperparameter tuning, consistently achieves strong performance across datasets. The combination of RFE and RFC yields the highest accuracy on the Taiwanese Bankruptcy dataset. These findings highlight the importance of selecting relevant features to optimize classification models in finance. The study offers practical insights for enhancing predictive accuracy in financial applications such as credit risk assessment, fraud detection, and customer profiling, thereby contributing to the development of more robust and interpretable machine learning models in the financial sector.

Item Type: Article
Uncontrolled Keywords: Financial data, random forest, boruta
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Rosnani Abd Wahab
Date Deposited: 22 Dec 2025 04:35
Last Modified: 26 Dec 2025 04:25
URII: http://shdl.mmu.edu.my/id/eprint/15102

Downloads

Downloads per month over past year

View ItemEdit (login required)