Citation
Ho, Ian H. J. and Goh, Hui Ngo and Tan, Yi Fei (2022) Inducing a Malay Lexicon from an Unlabelled Dataset Using Word Embeddings. In: Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence. Springer Link, pp. 382-394. Full text not available from this repository.Abstract
The Malay language in Malaysia is commonly mixed with English and slang words especially when observing social media text. In the past, researchers would develop Malay-English lexicons by translating widely accepted English lexicons and carrying over the labels. However, translation may produce words that have no sentiment or is rarely used in its translated form. In this paper, an end-to-end framework involving a two-phase sentiment classification method is proposed. The first phase presents a method where a domain-specific lexicon is induced from word embeddings trained on an unlabelled dataset. It was found that the induced lexicon-based classifier outperformed general and translated lexicons. However, lexicon-based methods still face low recall issues when documents have sentiment words not found in the lexicon. This can be mitigated by the second phase of the framework where a supervised classifier is trained on a filtered output of the induced lexicon-based method to produce a model that can classify new and unseen documents. This framework was shown to be applicable to any Malay or English unlabelled dataset through results with a f1-score of 0.81 on the MALAYA Twitter dataset and 0.82 on the IMDB movie review dataset.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | Lexicon induction, Domain-specific, Sentiment analysis |
Subjects: | Q Science > Q Science (General) > Q300-390 Cybernetics |
Divisions: | Faculty of Computing and Informatics (FCI) |
Depositing User: | Ms Nurul Iqtiani Ahmad |
Date Deposited: | 07 Oct 2022 01:28 |
Last Modified: | 07 Oct 2022 01:28 |
URII: | http://shdl.mmu.edu.my/id/eprint/10484 |
Downloads
Downloads per month over past year
Edit (login required) |