Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter

  • Alvi Rahmy Royyan Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: sentiment analysis, feature expansion, word2vec, public policy

Abstract

Social media users, especially on Twitter, can freely express opinions or other information in the form of tweets about anything, including responding to a public policy. In a written tweet, there is a limit of 280 characters per tweet and this allows for problems such as vocabulary mismatches. Therefore, in this study, the feature expansion Word2vec method was applied to overcome when the vocabulary mismatches occur. This study develops and compares the Twitter sentiment analysis system using the feature expansion Word2vec method with the Logistic Regression (LR) and Support Vector Machine (SVM) classification algorithms and the system without the feature expansion Word2Vec method. The results of this study, the feature expansion Word2Vec method on the SVM classification algorithm succeeded in increasing the system accuracy up to 0,99% with an accuracy value of 78,99%.

Downloads

Download data is not yet available.

References

A. M. Kaplan and M. Haenlein, “The early bird catches the news: Nine things you should know about micro-blogging,” Bus. Horiz., vol. 54, no. 2, pp. 105–113, 2011, doi: 10.1016/j.bushor.2010.09.004.

Ying Lin, “10 Twitter Statistics Every Marketer Should Know in 2021 [Infographic],” Jan. 25, 2021. https://id.oberlo.com/blog/twitter-statistics (accessed Mar. 01, 2021).

S. E. Saad and J. Yang, “Twitter Sentiment Analysis Based on Ordinal Regression,” IEEE Access, vol. 7, pp. 163677–163685, 2019, doi: 10.1109/ACCESS.2019.2952127.

Z. Jianqiang, G. Xiaolin, and Z. Xuejun, “Deep Convolution Neural Networks for Twitter Sentiment Analysis,” IEEE Access, vol. 6, pp. 23253–23260, 2018, doi: 10.1109/ACCESS.2017.2776930.

Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017, doi: 10.1109/ACCESS.2017.2672677.

M. A. Fauzi, “Word2Vec model for sentiment analysis of product reviews in Indonesian language,” Int. J. Electr. Comput. Eng., vol. 9, no. 1, p. 525, 2019, doi: 10.11591/ijece.v9i1.pp525-530.

F. W. Kurniawan and W. Maharani, “Indonesian Twitter Sentiment Analysis Using Word2Vec,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, pp. 31–36, 2020, doi: 10.1109/ICoDSA50139.2020.9212906.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” 2018, doi: 10.1109/EECSI.2018.8752851.

S. P. Sheela, “Sentiment Analysis and Prediction of Online Reviews with Empty Ratings,” Int. J. Appl. Eng. Res., vol. 13, no. 14, 2018.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,” 2017, doi: 10.1109/TSSA.2016.7871085.

J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” 2016, doi: 10.21108/indosc.2016.111.

R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 334, 2019, doi: 10.22146/jnteti.v8i4.533.

Y. Goldberg, Neural network methods for natural language processing (Synthesis Lectures on Human Language Technologies), vol. 10, no. April. 2017.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, 2018, doi: 10.5120/ijca2018917395.

D. Dessì, R. Helaoui, V. Kumar, D. R. Recupero, and D. Riboni, “TF-IDF vs word embeddings for morbidity identification in clinical notes: An initial study,” CEUR Workshop Proc., vol. 2596, pp. 1–12, 2020.

M. S. R. Hitesh, V. Vaibhav, Y. J. A. Kalki, S. H. Kamtam, and S. Kumari, “Real-time sentiment analysis of 2019 election tweets using word2vec and random forest model,” 2019 2nd Int. Conf. Intell. Commun. Comput. Tech. ICCT 2019, pp. 146–151, 2019, doi: 10.1109/ICCT46177.2019.8969049.

H. Imaduddin, Widyawan, and S. Fauziati, “Word embedding comparison for Indonesian language sentiment analysis,” Proceeding - 2019 Int. Conf. Artif. Intell. Inf. Technol. ICAIIT 2019, pp. 426–430, 2019, doi: 10.1109/ICAIIT.2019.8834536.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.

M. M. Truşcă, “Efficiency of SVM classifier with Word2Vec and Doc2Vec models,” Proc. Int. Conf. Appl. Stat., vol. 1, no. 1, pp. 496–503, 2020, doi: 10.2478/icas-2019-0043.

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” 2013.

J. Tang, Y. Wang, K. Zheng, and Q. Mei, “End-to-end learning for short text expansion,” 2017, doi: 10.1145/3097983.3098166.

C. Y. J. Peng, K. L. Lee, and G. M. Ingersoll, “An introduction to logistic regression analysis and reporting,” J. Educ. Res., vol. 96, no. 1, 2002, doi: 10.1080/00220670209598786.

L. Sravani, A. S. Reddy, and S. Thara, “A Comparison Study of Word Embedding for Detecting Named Entities of Code-Mixed Data in Indian Language,” 2018 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2018, pp. 2375–2381, 2018, doi: 10.1109/ICACCI.2018.8554918.

S. Chandra Satapathy and A. Joshi, Smart Innovation, Systems and Technologies 107 Information and Communication Technology for Intelligent Systems, vol. 2. 2018.

U. Rofiqoh, R. S. Perdana, and M. A. Fauzi, “Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexion Based Feature,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1725–1732, 2017, [Online]. Available: http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/628.

W. Z. Lu and D. Wang, “Learning machines: Rationale and application in ground-level ozone prediction,” Appl. Soft Comput. J., vol. 24, 2014, doi: 10.1016/j.asoc.2014.07.008.

Published
2022-02-27
How to Cite
Alvi Rahmy Royyan, & Erwin Budi Setiawan. (2022). Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 78 - 84. https://doi.org/10.29207/resti.v6i1.3525
Section
Artikel Rekayasa Sistem Informasi

Most read articles by the same author(s)