Citation
Bibi, Maryam and Rehman, Zahoor-Ur and Awan, Khalid Mahmood and Kamal, Shahid (2026) HiCoBERT: a hierarchical transformer-based framework for Pakistan legal judgment section segmentation with multi-level XAI insights. Scientific Reports, 16 (1). ISSN 2045-2322|
Text
s41598-026-45259-w.pdf - Published Version Restricted to Repository staff only Download (4MB) |
Abstract
In many jurisdictions, legal judgments are written as long continuous narratives in which the functional progression of facts, issues, reasoning, and decision is expressed implicitly rather than through explicit section boundaries. Although recent advances in Transformer-based language models and large language models have significantly improved automated text analysis, these models still struggle with long judicial documents. Standard architectures encode documents as flat token sequences, do not explicitly model dependencies between discourse segments, and are constrained by input token limits, making it difficult to capture the ordered functional structure underlying judicial decisions. To address this limitation, we propose HiCoBERT, a hierarchically contextualized Transformer framework for segmenting functional sections in legal judgments. The model represents a judgment as an ordered sequence of contiguous text segments (fixed-length chunks). It first encodes intra-segment semantics and then models document-level dependencies across segments. This hierarchical design captures the logical flow of judicial decisions, enabling the efficient processing of lengthy judicial documents. Experiments on LeJA, a dataset of Supreme Court of Pakistan judgments curated for this study, show that the proposed framework achieves 0.80 accuracy, 0.70 macro-F1, and 0.79 span-F1, outperforming strong long-document baselines including Longformer, BigBird, and LongT5, as well as structured models such as LegalBERT+CRF. Additional comparisons with modern prompted LLMs (GPT-4o, Gemini Pro, DeepSeek-V3) further contextualize the performance. Cross-dataset evaluation on LegalSeg, a publicly available benchmark of annotated Indian legal judgments, demonstrates robustness under jurisdiction shift, while explainability analyses highlight the importance of hierarchical contextualization.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Legal NLP, Rhetorical section segmentation, Legal judgment segmentation |
| Subjects: | K Law > K Law (General) |
| Divisions: | Faculty of Artificial Intelligence & Engineering (FAIE) |
| Depositing User: | Ms Rosnani Abd Wahab |
| Date Deposited: | 04 May 2026 05:40 |
| Last Modified: | 07 May 2026 08:44 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15884 |
Downloads
Downloads per month over past year
Edit (login required) |
