Exploring AI-driven approaches for unstructured document analysis and future horizons

Citation

Mahadevkar, Supriya V. and Patil, Shruti and Kotecha, Ketan and Lim, Way Soong and Choudhury, Tanupriya (2024) Exploring AI-driven approaches for unstructured document analysis and future horizons. Journal of Big Data, 11 (1). ISSN 2196-1115

[img] Text
Exploring AI-driven approaches for unstructured document analysis and future horizons.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

In the current industrial landscape, a signifcant number of sectors are grappling with the challenges posed by unstructured data, which incurs fnancial losses amounting to millions annually. If harnessed efectively, this data has the potential to substantially boost operational efciency. Traditional methods for extracting information have their limitations; however, solutions powered by artifcial intelligence (AI) could provide a more ftting alternative. There is an evident gap in scholarly research concerning a comprehensive evaluation of AI-driven techniques for the extraction of information from unstructured content. This systematic literature review aims to identify, assess, and deliberate on prospective research directions within the feld of unstructured document information extraction. It has been observed that prevailing extraction methods primarily depend on static patterns or rules, often proving inadequate when faced with complex document structures typically encountered in realworld scenarios, such as medical records. Datasets currently available to the public sufer from low quality and are tailored for specifc tasks only. This underscores an urgent need for developing new datasets that accurately refect complex issues encountered in practical settings. The review reveals that AI-based techniques show promise in autonomously extracting information from diverse unstructured documents, encompassing both printed and handwritten text. Challenges arise, however, when dealing with varied document layouts. Proposing a framework through hybrid AI-based approaches, this review envisions processing a high-quality dataset for automatic information extraction from unstructured documents. Additionally, it emphasizes the importance of collaborative eforts between organizations and researchers to address the diverse challenges associated with unstructured data analysis

Item Type: Article
Uncontrolled Keywords: Artifcial intelligence, Unstructured document processing,
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Engineering and Technology (FET)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 01 Aug 2024 02:42
Last Modified: 01 Aug 2024 02:42
URII: http://shdl.mmu.edu.my/id/eprint/12699

Downloads

Downloads per month over past year

View ItemEdit (login required)