Hybrid Machine Learning Architectures for Emergency Triage: A Systematic Review of Predictive Performance and the Complexity Gradient

Citation

Ullah, Junaid and Ramasamy, R Kanesaraj and Rajendran, Venushini (2026) Hybrid Machine Learning Architectures for Emergency Triage: A Systematic Review of Predictive Performance and the Complexity Gradient. BioMedInformatics, 6 (2). p. 21. ISSN 2673-7426

[img] Text
biomedinformatics-06-00021-v3.pdf - Published Version
Restricted to Repository staff only

Download (874kB)

Abstract

Background: Emergency triage systems using machine learning traditionally rely on structured tabular data (vital signs), creating a “contextual blind spot” that ignores diagnostic information embedded in unstructured clinical narratives. Hybrid AI models that fuse tabular and text data may improve predictive discrimination, but the magnitude and conditions under which fusion adds value remain unclear. Methods: Five databases (PubMed, Scopus, Web of Science, IEEE Xplore, ACM Digital Library) were searched from 1 January 2015 to 15 December 2025. Eligible studies employed Hybrid AI models integrating structured and unstructured emergency department data with quantitative baseline comparisons. Twenty-five studies (N ≈ 4.8 million encounters) met inclusion criteria. We extracted marginal performance gains (ΔAUC), calibration metrics, and demographic reporting. Synthesis followed SWiM principles with subgroup meta-regression testing our novel “Complexity Gradient” hypothesis. Results: Hybrid models demonstrated superior discrimination compared to tabular baselines, with effect magnitude dependent on clinical task complexity. Low-complexity tasks (tachycardia prediction) showed minimal gains (median ΔAUC + 0.036, IQR: 0.02–0.05), while high-complexity tasks (hypoxia, sepsis) demonstrated substantial improvement (median ΔAUC + 0.111, IQR: 0.09–0.13). Meta-regression confirmed complexity significantly moderated effect size (R2 = 0.42, p = 0.003). Only 12% (3/25) of studies reported calibration metrics (Brier scores: 0.089–0.142). Zero studies stratified performance by race/ethnicity; 88% (22/25) failed to report training data demographics. Discussion: The complexity gradient framework explains when multimodal fusion adds predictive value: tasks where diagnostic signal resides in narrative features (temporality, negation) rather than physiological measurements. However, systematic absence of calibration reporting and fairness auditing prevents clinical deployment. Seventy-two percent of studies had high risk of bias in the analysis domain due to retrospective designs without temporal validation. Conclusions: Hybrid triage models show promise for complex diagnostic tasks but require mandatory calibration reporting and demographic performance stratification before clinical implementation. We propose minimum reporting standards including Brier scores, race-stratified metrics, and temporal validation protocols.

Item Type: Article
Uncontrolled Keywords: machine learning, emergency triage, multimodal fusion, natural language processing, systematic review, algorithmic fairness, PRISMA 2020
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 05 Jun 2026 08:09
Last Modified: 05 Jun 2026 08:09
URII: http://shdl.mmu.edu.my/id/eprint/16067

Downloads

Downloads per month over past year

View ItemEdit (login required)