Citation
Koay, Xin Kuang and Ong, Lee Yeng and Goh, Pey Yun (2026) Structure-Aware Chunking for Complex Tables in Retrieval-Augmented Generation Systems. Emerging Science Journal, 10 (1). pp. 184-205. ISSN 2610-9182|
Text
9 (4).pdf - Published Version Restricted to Repository staff only Download (2MB) |
Abstract
Retrieval-Augmented Generation (RAG) is a hybrid method that combines information retrieval with large language models to generate context-aware, factually grounded responses. However, the RAG system relies heavily on well-structured input data to generate accurate and contextually relevant responses. Documents with complex table layouts pose significant challenges, as most chunking strategies are text-centric and often overlook table-rich documents containing multicolumn and multi-row structures. Hence, this study proposes a customized structure-aware chunking framework specifically designed for university course documents containing multi-column, multirow tables with nested headers. The framework employs Camelot for high-fidelity table extraction, followed by customized logic that constructs semantically coherent chunks by preserving academic term, subject name, credit hour, and category. This prevents semantic fragmentation during retrieval. The proposed method is evaluated using the RAGAS framework and compared against several baselines using standard parsing and chunking techniques. Results show that the proposed approach achieves the highest answer accuracy of 0.73 and substantially improves retrieval relevance and contextual precision. These findings demonstrate the framework’s effectiveness in handling structure-dependent academic queries. This study highlights that ensuring both parsing quality and chunking strategy is essential to retain semantic relationships in table-rich documents, offering a practical improvement for RAG systems in structurally complex scenarios.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Large language model, chunking |
| Subjects: | Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science |
| Divisions: | Faculty of Information Science and Technology (FIST) |
| Depositing User: | Ms Rosnani Abd Wahab |
| Date Deposited: | 03 Mar 2026 02:26 |
| Last Modified: | 03 Mar 2026 02:26 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15426 |
Downloads
Downloads per month over past year
Edit (login required) |
