Aspect-Based Sentiment Analysis on Twitter Using Logistic Regression with FastText Feature Expansion
Abstract
Social media has recently been widely used by users, especially Indonesians, as a place to express themselves in sentences, pictures, sounds, or videos. Twitter is one of the social media favored by people of diverse ages. Twitter is a social media that provides features like social media in general. However, Twitter has a unique feature where users can send or read text messages limited to only a few characters. Therefore, user tweets with topics related to a particular product can be utilized by companies to become input in the development of these products. This research was conducted using tweet data on the topic of Telkomsel, which is divided into two aspects, namely signal and service. Aspect-based sentiment analysis of Telkomsel was carried out using Logistic Regression with FastText feature expansion to reduce vocabulary mismatch in tweets so that the classification stage can be performed optimally. In addition, the Synthetic Minority Oversampling Technique (SMOTE) sampling method was applied to overcome data imbalance. The test results prove that feature expansion can improve F1-Score values for signal and service aspects. For the signal aspect, F1-Score increased by 3.33% from the baseline with a value of 96.48%. While for the service aspect, F1-Score increased by 12.91% from the baseline with a value of 95.57%.
Downloads
References
M. Rezwanul, A. Ali, and A. Rahman, “Sentiment Analysis on Twitter Data using KNN and SVM,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 6, pp. 19–25, 2017, doi: 10.14569/ijacsa.2017.080603.
S. Thavareesan and S. Mahesan, “Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation,” 2019 IEEE 14th Int. Conf. Ind. Inf. Syst. Eng. Innov. Ind. 4.0, ICIIS 2019 - Proc., pp. 320–325, 2019, doi: 10.1109/ICIIS47346.2019.9063341.
N. Zainuddin, A. Selamat, and R. Ibrahim, “Hybrid sentiment classification on twitter aspect-based sentiment analysis,” Appl. Intell., vol. 48, no. 5, pp. 1218–1232, 2018, doi: 10.1007/s10489-017-1098-6.
A. Poornima and K. S. Priya, “A Comparative Sentiment Analysis of Sentence Embedding Using Machine Learning Techniques,” 2020 6th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2020, pp. 493–496, 2020, doi: 10.1109/ICACCS48705.2020.9074312.
R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.
R. Velioglu, T. Yildiz, and S. Yildirim, “Sentiment Analysis Using Learning Approaches over Emojis for Turkish Tweets,” UBMK 2018 - 3rd Int. Conf. Comput. Sci. Eng., pp. 303–307, 2018, doi: 10.1109/UBMK.2018.8566260.
T. D. Dikiyanti, A. M. Rukmi, and M. I. Irawan, “Sentiment analysis and topic modeling of BPJS Kesehatan based on twitter crawling data using Indonesian Sentiment Lexicon and Latent Dirichlet Allocation algorithm,” J. Phys. Conf. Ser., vol. 1821, no. 1, 2021, doi: 10.1088/1742-6596/1821/1/012054.
Samsir et al., “Naives Bayes Algorithm for Twitter Sentiment Analysis,” Journal of Physics: Conference Series, vol. 1933, no. 1, p. 012019, 2021, doi: 10.1088/1742-6596/1933/1/012019.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 2018-Octob, pp. 509–513, 2018, doi: 10.1109/EECSI.2018.8752851.
R. Dzisevic and D. Sesok, “Text Classification using Different Feature Extraction Approaches,” 2019 Open Conf. Electr. Electron. Inf. Sci. eStream 2019 - Proc., pp. 1–4, 2019, doi: 10.1109/eStream.2019.8732167.
S. Shumaly, M. Yazdinejad, and Y. Guo, “Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings,” PeerJ Comput. Sci., vol. 7, pp. 1–22, 2021, doi: 10.7717/peerj-cs.422.
B. Athiwaratkun, A. G. Wilson, and A. Anandkumar, “Probabilistic fasttext for multi-sense word embeddings,” ACL 2018 - 56th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 1, pp. 1–11, 2018, doi: 10.18653/v1/p18-1001.
A. De Caigny, K. Coussement, and K. W. De Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” Eur. J. Oper. Res., vol. 269, no. 2, pp. 760–772, 2018, doi: 10.1016/j.ejor.2018.02.009.
Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293.
P. Lauren, G. Qu, J. Yang, P. Watta, G. Bin Huang, and A. Lendasse, “Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks,” Cognit. Comput., vol. 10, no. 4, pp. 625–638, 2018, doi: 10.1007/s12559-018-9548-y.
M. Ibrahim, M. Torki, and N. El-Makky, “Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning,” Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2018, pp. 875–878, 2019, doi: 10.1109/ICMLA.2018.00141.
A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;