Preprocessing Impact on Sentiment Analysis Performance on Malay Social Media Text


Ho, Ian H. J. and Goh, Hui Ngo and Tan, Yi Fei (2022) Preprocessing Impact on Sentiment Analysis Performance on Malay Social Media Text. Journal of System and Management Sciences, 12 (5). pp. 73-90. ISSN 1816-6075, 1818-0523

[img] Text
Restricted to Repository staff only

Download (415kB)


The preprocessing stage in any natural language processing field has been widely accepted to be a requirement to achieve the best performance in machine learning tasks. In a low resource language like Malay, majority of recent studies had implemented generalized methods for preprocessing like stopword removal using a Malay-translated English stopword list and simple text normalization algorithms. This study was done to explore the extent of impact of preprocessing on supervised sentiment classification in the social media domain, primarily on Malay, while considering various factors like general vs. domainspecific stopword removal, feature vectorizers, and dataset size. The impact of normalization and stopword removal was found to be minor and it was surprising to find that sentiment classification with an unprocessed dataset achieved a similar f1-score in the 85% range to a processed dataset. The choice of feature vectorization method and type of classification model had more positive impact on classification performance.

Item Type: Article
Uncontrolled Keywords: Malay text, pre-processing, sentiment analysis, opinion mining, domain specific.
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Computing and Informatics (FCI)
Faculty of Engineering (FOE)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 01 Dec 2022 04:05
Last Modified: 01 Dec 2022 04:05


Downloads per month over past year

View ItemEdit (login required)