Performance of Sentiment Classification on Tweets of Clothing Brands

Citation

Ng, Hu and Jalani, Muhammad Shafiq and Yap, Timothy Tzen Vun and Goh, Vik Tor (2022) Performance of Sentiment Classification on Tweets of Clothing Brands. Journal of Informatics and Web Engineering, 1 (1). pp. 16-22. ISSN 2821-370X

[img] Text
39.pdf - Published Version
Restricted to Repository staff only

Download (549kB)

Abstract

Social media such as Facebook, Instagram, LinkedIn, and Twitter ease the sharing of ideas, thoughts, videos, and photos and information through the building of virtual networks and communities. This has allowed companies and products to reach a wider audience in terms of marketing and advertising, and to gauge feedback from the public. This research investigates clothing brand mentions on Twitter to perform sentiment analysis on users’ thoughts on three clothing brands, namely Asos, Uniqlo and Topshop. The data is collected by applying python libraries, Tweepy to access data from the Twitter streaming API. Following that, data pre-processing such as tokenization, filtering, stemming, and case normalization are performed to remove outliers. Then, the TextBlob algorithm is applied to label the tweet data into three classes; Positive, Negative and Neutral based on the polarity of the tweets. Word embeddings are also created using Word2Vec with TF-IDF. The word embeddings are fed into classification models namely Support Vector Machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic Regression (LR) and Multilayer Perceptron (MLP) by comparing their accuracy performances. The models went through training and testing process on a curated tweet dataset comprising 24000 records with three clothing brands (Asos, Uniqlo, Topshop). The classification process was carried out by SVM, NB, RF, LR and MLP with a ratio of 50-50 and 70-30 train-test splits. Hyperparameter tuning was implemented by GridSearchCV to find the best parameters of classification models in order to optimize the best results. The evaluation of performance was measured with accuracy, precision, recall and F1-Score. In the 50-50 train-test splits, LR achieved the highest accuracy by scoring 82%, 87% and 87% on Asos, Uniqlo and Topshop respectively. In the 70-30 train-test splits, LR also achieved highest accuracy by scoring 85%, 90% and 90% for the three clothing brands respectively.

Item Type: Article
Uncontrolled Keywords: Sentiment analysis, classification, machine learning, clothing brand
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 14 Apr 2023 01:22
Last Modified: 14 Apr 2023 01:22
URII: http://shdl.mmu.edu.my/id/eprint/10692

Downloads

Downloads per month over past year

View ItemEdit (login required)