Weighted Plain Bayes-Based English Text Classification in Multilingual Interactive Environments

Citation

Hu, Wen Li and Zhang, Daniel (2024) Weighted Plain Bayes-Based English Text Classification in Multilingual Interactive Environments. Journal of Network Intelligence, 9 (4). pp. 1984-1999. ISSN 2414-8105

[img] Text
04.JNI-S-2023-12-019.pdf - Published Version
Restricted to Repository staff only

Download (367kB)

Abstract

In today’s world, the process of globalization is accelerating, and the communication between different countries and nationalities is becoming more and more frequent. In this background, this paper proposes a weighted plain Bayesian-based English text classification algorithm in a multilingual interactive environment (WTWNBA) to address the problems of low classification efficiency and long classification time of the current English text classification algorithm. Firstly, the orthogonal transformation method is used to eliminate the linear relationship between continuous attributes; the conditional probabilities of weighted discrete attributes and orthogonal transformed continuous attributes are differentiated and computed, so as to improve the generalization ability of the WNBA algorithm. Then, for the problem that the existing classification algorithms do not consider the influence of the location of words on the text, a weighted plain Bayesian-based English text classification algorithm is proposed for the multilingual interactive environment, which introduces interclass and intraclass discretization factors of the feature words and assigns different weights to different locations of the English words, which strengthens the ability of differentiating the information of feature words’ class distributions, and realizes the accurate classification of the English text. The experimental results show that the Accuracy, Precision, Recall and F1 values of ETWNBA are more than 90% on each dataset of the experiment, and when the number of texts is 100, the classification time is 210s, which has high classification efficiency and low classification time.

Item Type: Article
Uncontrolled Keywords: Text categorization; Weighted plain Bayes; Orthogonal transformations; Discrete attributes; Multilingual interaction
Subjects: Q Science > QA Mathematics > QA299.6-433 Analysis
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 03 Jan 2025 06:06
Last Modified: 03 Jan 2025 06:09
URII: http://shdl.mmu.edu.my/id/eprint/13312

Downloads

Downloads per month over past year

View ItemEdit (login required)