Citation
Shahariar Parvez, A. H. M. and Samiul Islam, Md. and Al Farid, Fahmid and Yeasmin, Tashida and Islam Akash, Md. Monirul and Azam, Md. Shafiul and Uddin, Jia and Abdul Karim, Hezerul (2025) Enhancing Bangla handwritten character recognition using Vision Transformers, VGG-16, and ResNet-50: a performance analysis. Frontiers in Big Data, 8. ISSN 2624-909X|
Text
fdata-8-1682984.pdf - Published Version Restricted to Repository staff only Download (2MB) |
Abstract
Bangla Handwritten Character Recognition (BHCR) remains challenging due to complex alphabets, and handwriting variations. In this study, we present a comparative evaluation of three deep learning architectures—Vision Transformer (ViT), VGG-16, and ResNet-50—on the CMATERdb 3.1.2 dataset comprising 24,000 images of 50 basic Bangla characters. Our work highlights the effectiveness of ViT in capturing global context and long-range dependencies, leading to improved generalization. Experimental results show that ViT achieves a state-of-the-art accuracy of 98.26%, outperforming VGG-16 (94.54%) and ResNet-50 (93.12%). We also analyze model behavior, discuss overfitting in CNNs, and provide insights into character-level misclassifications. This study demonstrates the potential of transformer-based architectures for robust BHCR and offers a benchmark for future research.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | deep learning, Bangla handwritten character recognition, optical character recognition, convolutional neural network, Vision Transformer (ViT), VGG-16, ResNet-50 |
| Subjects: | Q Science > QA Mathematics > QA71-90 Instruments and machines |
| Divisions: | Faculty of Artificial Intelligence & Engineering (FAIE) |
| Depositing User: | Ms Suzilawati Abu Samah |
| Date Deposited: | 12 Dec 2025 01:31 |
| Last Modified: | 12 Dec 2025 01:31 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15073 |
Downloads
Downloads per month over past year
Edit (login required) |
