Citation
Chen, Lifei and Khalid, Othman Waleed and Tiang, Jun Jiat and Tiang, Sew Sun and Shin, Dong Youn and Choo, Yit Hong and Sharma, Abhishek and Lim, Wei Hong (2026) Uncertainty aware and explainable construction cost prediction using a hybrid probabilistic learning model. Scientific Reports, 16 (1). ISSN 2045-2322|
Text
s41598-026-44904-8.pdf - Published Version Restricted to Repository staff only Download (5MB) |
Abstract
This paper presents a unified probabilistic framework for construction cost forecasting, NGBoost-ETR (Natural Gradient Boosting with Extra Trees base learners) that delivers predictive accuracy, calibrated uncertainty, and SHAP-based interpretability. Rather than positioning novelty in algorithmic integration alone, the contribution lies in a systems-level design that jointly addresses three critical gaps in cost modeling: reliable interval calibration, model interpretability, and robustness across complex feature interactions in domain such as sustainable building. Trained on a real-world RSMeans dataset of 4477 samples, NGBoost-ETR achieves superior predictive performance (R2 = 0.9866, RMSE = 0.4986, MSE = 0.2486, MAE = 0.2300, and MAPE = 1.4314%) compared to 10 baseline regressors and 9 NGBoost-based hybrids. Beyond point prediction, the model also demonstrates robust probabilistic calibration, validated through a comprehensive suite of six quantitative metrics. Specifically, evaluation based on Prediction Interval Coverage Probability (PICP), Prediction Interval Normalized Average Width (PINAW), Mean Prediction Interval Width (MPIW), Coverage Width-based Criterion (CWC), Negative Log-Likelihood (NLL), and Continuous Ranked Probability Score (CRPS). In head-to-head ablations, NGBoost-ETR attains the best interval efficiency (lowest PINAW, MPIW), overall calibration (lowest CWC), and distributional accuracy (lowest CRPS), with competitive NLL and acceptable coverage—outperforming variants that inflate coverage via impractically wide intervals. Crucially, not all hybrids are beneficial (e.g., some NGBoost pairings underperform their base learners), emphasizing that the ETR pairing is a validated choice rather than a generic integration. This work introduces a process innovation by embedding reliable uncertainty estimation into tree-based models without sacrificing performance, supporting greater resource efficiency in cost planning and estimation. The resulting framework not only supports data-driven budgeting and tendering but also promotes transparent institutions and risk-aware decision-making in construction management.
| Item Type: | Article |
|---|---|
| Subjects: | H Social Sciences > HD Industries. Land use. Labor > HD9000-9999 Special industries and trades |
| Divisions: | Faculty of Artificial Intelligence & Engineering (FAIE) |
| Depositing User: | Ms Suzilawati Abu Samah |
| Date Deposited: | 04 May 2026 01:03 |
| Last Modified: | 04 May 2026 01:03 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15800 |
Downloads
Downloads per month over past year
Edit (login required) |
