Aspect-Based Sentiment Analysis on Twitter Using Logistic Regression with FastText Feature Expansion

  • Hanif Reangga Alhakiem Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: aspect-based sentiment analysis, logistic regression, fasttext, feature expansion, twitter

Abstract

Social media has recently been widely used by users, especially Indonesians, as a place to express themselves in sentences, pictures, sounds, or videos. Twitter is one of the social media favored by people of diverse ages. Twitter is a social media that provides features like social media in general. However, Twitter has a unique feature where users can send or read text messages limited to only a few characters. Therefore, user tweets with topics related to a particular product can be utilized by companies to become input in the development of these products. This research was conducted using tweet data on the topic of Telkomsel, which is divided into two aspects, namely signal and service. Aspect-based sentiment analysis of Telkomsel was carried out using Logistic Regression with FastText feature expansion to reduce vocabulary mismatch in tweets so that the classification stage can be performed optimally. In addition, the Synthetic Minority Oversampling Technique (SMOTE) sampling method was applied to overcome data imbalance. The test results prove that feature expansion can improve F1-Score values for signal and service aspects. For the signal aspect, F1-Score increased by 3.33% from the baseline with a value of 96.48%. While for the service aspect, F1-Score increased by 12.91% from the baseline with a value of 95.57%.

Downloads

Download data is not yet available.

References

M. Rezwanul, A. Ali, and A. Rahman, “Sentiment Analysis on Twitter Data using KNN and SVM,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 6, pp. 19–25, 2017, doi: 10.14569/ijacsa.2017.080603.

S. Thavareesan and S. Mahesan, “Sentiment Analysis in Tamil Texts: A Study on Machine Learning Techniques and Feature Representation,” 2019 IEEE 14th Int. Conf. Ind. Inf. Syst. Eng. Innov. Ind. 4.0, ICIIS 2019 - Proc., pp. 320–325, 2019, doi: 10.1109/ICIIS47346.2019.9063341.

N. Zainuddin, A. Selamat, and R. Ibrahim, “Hybrid sentiment classification on twitter aspect-based sentiment analysis,” Appl. Intell., vol. 48, no. 5, pp. 1218–1232, 2018, doi: 10.1007/s10489-017-1098-6.

A. Poornima and K. S. Priya, “A Comparative Sentiment Analysis of Sentence Embedding Using Machine Learning Techniques,” 2020 6th Int. Conf. Adv. Comput. Commun. Syst. ICACCS 2020, pp. 493–496, 2020, doi: 10.1109/ICACCS48705.2020.9074312.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.

R. Velioglu, T. Yildiz, and S. Yildirim, “Sentiment Analysis Using Learning Approaches over Emojis for Turkish Tweets,” UBMK 2018 - 3rd Int. Conf. Comput. Sci. Eng., pp. 303–307, 2018, doi: 10.1109/UBMK.2018.8566260.

T. D. Dikiyanti, A. M. Rukmi, and M. I. Irawan, “Sentiment analysis and topic modeling of BPJS Kesehatan based on twitter crawling data using Indonesian Sentiment Lexicon and Latent Dirichlet Allocation algorithm,” J. Phys. Conf. Ser., vol. 1821, no. 1, 2021, doi: 10.1088/1742-6596/1821/1/012054.

Samsir et al., “Naives Bayes Algorithm for Twitter Sentiment Analysis,” Journal of Physics: Conference Series, vol. 1933, no. 1, p. 012019, 2021, doi: 10.1088/1742-6596/1933/1/012019.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” Int. Conf. Electr. Eng. Comput. Sci. Informatics, vol. 2018-Octob, pp. 509–513, 2018, doi: 10.1109/EECSI.2018.8752851.

R. Dzisevic and D. Sesok, “Text Classification using Different Feature Extraction Approaches,” 2019 Open Conf. Electr. Electron. Inf. Sci. eStream 2019 - Proc., pp. 1–4, 2019, doi: 10.1109/eStream.2019.8732167.

S. Shumaly, M. Yazdinejad, and Y. Guo, “Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings,” PeerJ Comput. Sci., vol. 7, pp. 1–22, 2021, doi: 10.7717/peerj-cs.422.

B. Athiwaratkun, A. G. Wilson, and A. Anandkumar, “Probabilistic fasttext for multi-sense word embeddings,” ACL 2018 - 56th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf. (Long Pap., vol. 1, pp. 1–11, 2018, doi: 10.18653/v1/p18-1001.

A. De Caigny, K. Coussement, and K. W. De Bock, “A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees,” Eur. J. Oper. Res., vol. 269, no. 2, pp. 760–772, 2018, doi: 10.1016/j.ejor.2018.02.009.

Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293.

P. Lauren, G. Qu, J. Yang, P. Watta, G. Bin Huang, and A. Lendasse, “Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks,” Cognit. Comput., vol. 10, no. 4, pp. 625–638, 2018, doi: 10.1007/s12559-018-9548-y.

M. Ibrahim, M. Torki, and N. El-Makky, “Imbalanced Toxic Comments Classification Using Data Augmentation and Deep Learning,” Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2018, pp. 875–878, 2019, doi: 10.1109/ICMLA.2018.00141.

A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

Published
2022-11-02
How to Cite
Alhakiem, H. R., & Setiawan, E. B. (2022). Aspect-Based Sentiment Analysis on Twitter Using Logistic Regression with FastText Feature Expansion. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(5), 840 - 846. https://doi.org/10.29207/resti.v6i5.4429
Section
Information Technology Articles