Uncertainty aware and explainable construction cost prediction using a hybrid probabilistic learning model

Citation

Chen, Lifei and Khalid, Othman Waleed and Tiang, Jun Jiat and Tiang, Sew Sun and Shin, Dong Youn and Choo, Yit Hong and Sharma, Abhishek and Lim, Wei Hong (2026) Uncertainty aware and explainable construction cost prediction using a hybrid probabilistic learning model. Scientific Reports, 16 (1). ISSN 2045-2322

[img] Text
s41598-026-44904-8.pdf - Published Version
Restricted to Repository staff only

Download (5MB)

Abstract

This paper presents a unified probabilistic framework for construction cost forecasting, NGBoost-ETR (Natural Gradient Boosting with Extra Trees base learners) that delivers predictive accuracy, calibrated uncertainty, and SHAP-based interpretability. Rather than positioning novelty in algorithmic integration alone, the contribution lies in a systems-level design that jointly addresses three critical gaps in cost modeling: reliable interval calibration, model interpretability, and robustness across complex feature interactions in domain such as sustainable building. Trained on a real-world RSMeans dataset of 4477 samples, NGBoost-ETR achieves superior predictive performance (R2 = 0.9866, RMSE = 0.4986, MSE = 0.2486, MAE = 0.2300, and MAPE = 1.4314%) compared to 10 baseline regressors and 9 NGBoost-based hybrids. Beyond point prediction, the model also demonstrates robust probabilistic calibration, validated through a comprehensive suite of six quantitative metrics. Specifically, evaluation based on Prediction Interval Coverage Probability (PICP), Prediction Interval Normalized Average Width (PINAW), Mean Prediction Interval Width (MPIW), Coverage Width-based Criterion (CWC), Negative Log-Likelihood (NLL), and Continuous Ranked Probability Score (CRPS). In head-to-head ablations, NGBoost-ETR attains the best interval efficiency (lowest PINAW, MPIW), overall calibration (lowest CWC), and distributional accuracy (lowest CRPS), with competitive NLL and acceptable coverage—outperforming variants that inflate coverage via impractically wide intervals. Crucially, not all hybrids are beneficial (e.g., some NGBoost pairings underperform their base learners), emphasizing that the ETR pairing is a validated choice rather than a generic integration. This work introduces a process innovation by embedding reliable uncertainty estimation into tree-based models without sacrificing performance, supporting greater resource efficiency in cost planning and estimation. The resulting framework not only supports data-driven budgeting and tendering but also promotes transparent institutions and risk-aware decision-making in construction management.

Item Type: Article
Subjects: H Social Sciences > HD Industries. Land use. Labor > HD9000-9999 Special industries and trades
Divisions: Faculty of Artificial Intelligence & Engineering (FAIE)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 04 May 2026 01:03
Last Modified: 04 May 2026 01:03
URII: http://shdl.mmu.edu.my/id/eprint/15800

Downloads

Downloads per month over past year

View ItemEdit (login required)