A modified attention-based transformer ensemble for automated Bengali aggressive text identification

Citation

Rashid, Suhana Binta and Piyas, Bibhas Roy Chowdhury and Farid, Fahmid Al and Forhad, Md. Shafiul Alam and Rahman, Sadia and Preenon, Bijoy Roy Chowdhury and Abdul Karim, Hezerul and Arefin, Mohammad Shamsul and Miah, Abu Saleh Musa and Hasan, Mohammad Kamrul (2026) A modified attention-based transformer ensemble for automated Bengali aggressive text identification. Discover Applied Sciences, 8 (2). ISSN 3004-9261

[img] Text
s42452-025-08064-0.pdf - Published Version
Restricted to Repository staff only

Download (4MB)

Abstract

Aggression detection refers to the identification of offensive or harmful expressions in online communications and is crucial due to its significant societal implications and impact on online safety. However, detecting aggression in Bengali texts is particularly challenging due to the language’s complex morphology and the scarcity of annotated datasets and computational tools. Although the task has been extensively studied in high-resource languages, research in the resource-constrained Bengali language remains limited. To address the gap, in this paper, we developed a new dataset BOLT (Bangla Offensive and Lethal Texts) consisting of 4027 annotated instances classified into four classes based on severity of aggression- no aggression, hate speech, vandalism, and atrocity. To optimize model performance, we proposed a modified attention mechanism within transformers. Unlike conventional self-attention, which computes pairwise interactions between all tokens and treats their importance uniformly, our approach introduces a learnable token-wise weight matrix to assess the relevance of each token for aggressive text classification. The attention scores are obtained by multiplying the transformer’s token embeddings with this matrix to generate task-specific weights that better capture contextual significance. Moreover, we utilized a weighted ensemble of the optimized transformer models to combine their individual strengths. Experimental results show that our proposed approach outperforms existing methods and baseline models, achieving both accuracy and weighted F1-score of 87%. Our developed dataset and proposed method pave the way for impactful advancements in Bengali NLP and support content moderation tools and create safer digital platforms through automated aggression detection.

Item Type: Article
Uncontrolled Keywords: Sentiment analysis in Bengali
Subjects: H Social Sciences > HN Social history and conditions. Social problems. Social reform > HN50-995 By region or country
Divisions: Faculty of Artificial Intelligence & Engineering (FAIE)
Depositing User: Ms Rosnani Abd Wahab
Date Deposited: 27 Feb 2026 07:43
Last Modified: 27 Feb 2026 07:43
URII: http://shdl.mmu.edu.my/id/eprint/15361

Downloads

Downloads per month over past year

View ItemEdit (login required)