Main Menu

Clustering Disaggregated Household Consumption Expenditure for Policy Targeting: Evidence from Malaysia's HES 2022

Citation

Lee, Eng Keat and Ong, Thian Song and Lee, Yvonne Lean Ee (2026) Clustering Disaggregated Household Consumption Expenditure for Policy Targeting: Evidence from Malaysia's HES 2022. In: 8th International Symposium on Computational and Business Intelligence, ISCBI 2026, 6 February 2026 - 8 February 2026, Bali Island.

Text
5.pdf - Published Version
Restricted to Repository staff only
Download (692kB)

Official URL: https://doi.org/10.1109/ISCBI69404.2026.11496179

Abstract

This paper investigates whether disaggregated household consumption can be used to identify households facing budget constraints by applying machine learning clustering on household-head observations from Household Expenditure Survey (HES) in Malaysia. Unsupervised clustering is performed using K-means and Gaussian Mixture Models to segment households based on their consumption profiles. We transform the original 13 disaggregated consumption expenditure features into per-capita scale using OECD equivalence scale. To address the inherent highly right-skewed and sparsity data, we augment the dataset with two more binary features and ten engineered summary features. The performance of K-means and Gaussian Mixture Models are then evaluated across four alternate preprocessing pipelines (Robust scaling, Winsorization + Robust scaling, K-distance trimming, K-distance trimming + Winsorization). Following preprocessing, dimensionality reduction is performed by using Principal Component Analysis (PCA), and clustering quality is then assessed across different preprocessing pipelines using silhouette score. Results suggest the presence of four stable and interpretable clusters for policy recommendations. Among them, two clusters are identified as budget-constrained groups with low total spending and disproportionately high allocations of Food and Beverage spendings. The best-performing pipeline is K-Means model on the robust pipeline with silhouette score of 0.7031. These findings demonstrates that disaggregated consumption data, combined with simple engineered features allow the K-Means model to effectively segment households into well separated clusters. This study contributes to evidence-based policymaking by supporting target intervention such as cash transfer for budget-constrained groups. Moreover, this study also presents a reproducible preprocessing and clustering pipeline to handle sparse and highly skewed data.

Item Type:	Conference or Workshop Item (Paper)
Uncontrolled Keywords:	Consumption expenditure, clustering, K-Means, Gaussian mixture model, socioeconomic
Subjects:	H Social Sciences > HB Economic theory. Demography > HB71-74 Economics as a science. Relation to other subjects
Divisions:	Faculty of Information Science and Technology (FIST) Faculty of Management (FOM)
Depositing User:	Ms Rosnani Abd Wahab
Date Deposited:	30 Jun 2026 03:58
Last Modified:	30 Jun 2026 03:58
URII:	http://shdl.mmu.edu.my/id/eprint/16127

Downloads

Downloads per month over past year

Edit (login required)