Citation
Tan, Kian Long (2023) Deep learning-based sentiment analysis using RoBERTa and sequence models. Masters thesis, Multimedia University. Full text not available from this repository.Abstract
Sentiment analysis categorises text as positive, negative, or neutral and is crucial for understanding public opinions. As online platforms grow, sentiment analysis becomes increasingly important for economic and political decision-making. However, it faces challenges due to lexical diversity and long-distance dependencies in texts. Existing methods, relying on frequency-based descriptors, struggle to capture context and meaning. Attention models and sequence models have shown promise in natural language processing. Attention mechanisms focus on important tokens, while sequence models encode temporal information. To address these challenges, the study proposes three hybrid deep learning models that combine attention models and sequence models. The attention model generates representative embeddings, while the sequence models capture dependencies within the embeddings. The hybrid models include RoBERTa-LSTM, RoBERTa-BiLSTM, and RoBERTa-GRU. In the RoBERTa-LSTM hybrid model, RoBERTa assigns dynamic attention masks to the text sequence and produces contextual word embeddings. LSTM captures longdistance dependencies, but only in one direction. To address the fluidity of natural language, RoBERTa-BiLSTM uses Bidirectional LSTM to encode dependencies in both directions. To overcome vanishing gradient issues, LSTM and BiLSTM utilise a complex architecture with three gates. Alternatively, the RoBERTa-GRU hybrid model explores GRU, a simpler sequence model with two gates, offering faster computation. To further enhance sentiment analysis, an ensemble model combines predictions from all hybrid models using averaging ensemble and majority voting. The ensemble model achieves impressive accuracy percentages, with the highest accuracy observed on the IMDb dataset (94.85%), the Twitter US Airline Sentiment dataset (86.34%), and the Sentiment140 dataset (89.59%).
Item Type: | Thesis (Masters) |
---|---|
Additional Information: | Call No.: Q325.73 .T36 2023 |
Uncontrolled Keywords: | Deep learning (Machine learning) |
Subjects: | Q Science > Q Science (General) > Q300-390 Cybernetics |
Divisions: | Faculty of Information Science and Technology (FIST) |
Depositing User: | Ms Nurul Iqtiani Ahmad |
Date Deposited: | 27 Aug 2024 08:37 |
Last Modified: | 27 Aug 2024 08:37 |
URII: | http://shdl.mmu.edu.my/id/eprint/12862 |
Downloads
Downloads per month over past year
Edit (login required) |