Feature Expansion Word2Vec for Sentiment Analysis of Public Policy in Twitter
Abstract
Social media users, especially on Twitter, can freely express opinions or other information in the form of tweets about anything, including responding to a public policy. In a written tweet, there is a limit of 280 characters per tweet and this allows for problems such as vocabulary mismatches. Therefore, in this study, the feature expansion Word2vec method was applied to overcome when the vocabulary mismatches occur. This study develops and compares the Twitter sentiment analysis system using the feature expansion Word2vec method with the Logistic Regression (LR) and Support Vector Machine (SVM) classification algorithms and the system without the feature expansion Word2Vec method. The results of this study, the feature expansion Word2Vec method on the SVM classification algorithm succeeded in increasing the system accuracy up to 0,99% with an accuracy value of 78,99%.
Downloads
References
A. M. Kaplan and M. Haenlein, “The early bird catches the news: Nine things you should know about micro-blogging,” Bus. Horiz., vol. 54, no. 2, pp. 105–113, 2011, doi: 10.1016/j.bushor.2010.09.004.
Ying Lin, “10 Twitter Statistics Every Marketer Should Know in 2021 [Infographic],” Jan. 25, 2021. https://id.oberlo.com/blog/twitter-statistics (accessed Mar. 01, 2021).
S. E. Saad and J. Yang, “Twitter Sentiment Analysis Based on Ordinal Regression,” IEEE Access, vol. 7, pp. 163677–163685, 2019, doi: 10.1109/ACCESS.2019.2952127.
Z. Jianqiang, G. Xiaolin, and Z. Xuejun, “Deep Convolution Neural Networks for Twitter Sentiment Analysis,” IEEE Access, vol. 6, pp. 23253–23260, 2018, doi: 10.1109/ACCESS.2017.2776930.
Z. Jianqiang and G. Xiaolin, “Comparison research on text pre-processing methods on twitter sentiment analysis,” IEEE Access, vol. 5, pp. 2870–2879, 2017, doi: 10.1109/ACCESS.2017.2672677.
M. A. Fauzi, “Word2Vec model for sentiment analysis of product reviews in Indonesian language,” Int. J. Electr. Comput. Eng., vol. 9, no. 1, p. 525, 2019, doi: 10.11591/ijece.v9i1.pp525-530.
F. W. Kurniawan and W. Maharani, “Indonesian Twitter Sentiment Analysis Using Word2Vec,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, pp. 31–36, 2020, doi: 10.1109/ICoDSA50139.2020.9212906.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” 2018, doi: 10.1109/EECSI.2018.8752851.
S. P. Sheela, “Sentiment Analysis and Prediction of Online Reviews with Empty Ratings,” Int. J. Appl. Eng. Res., vol. 13, no. 14, 2018.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,” 2017, doi: 10.1109/TSSA.2016.7871085.
J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” 2016, doi: 10.21108/indosc.2016.111.
R. Ferdiana, F. Jatmiko, D. D. Purwanti, A. S. T. Ayu, and W. F. Dicka, “Dataset Indonesia untuk Analisis Sentimen,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 8, no. 4, p. 334, 2019, doi: 10.22146/jnteti.v8i4.533.
Y. Goldberg, Neural network methods for natural language processing (Synthesis Lectures on Human Language Technologies), vol. 10, no. April. 2017.
S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, 2018, doi: 10.5120/ijca2018917395.
D. Dessì, R. Helaoui, V. Kumar, D. R. Recupero, and D. Riboni, “TF-IDF vs word embeddings for morbidity identification in clinical notes: An initial study,” CEUR Workshop Proc., vol. 2596, pp. 1–12, 2020.
M. S. R. Hitesh, V. Vaibhav, Y. J. A. Kalki, S. H. Kamtam, and S. Kumari, “Real-time sentiment analysis of 2019 election tweets using word2vec and random forest model,” 2019 2nd Int. Conf. Intell. Commun. Comput. Tech. ICCT 2019, pp. 146–151, 2019, doi: 10.1109/ICCT46177.2019.8969049.
H. Imaduddin, Widyawan, and S. Fauziati, “Word embedding comparison for Indonesian language sentiment analysis,” Proceeding - 2019 Int. Conf. Artif. Intell. Inf. Technol. ICAIIT 2019, pp. 426–430, 2019, doi: 10.1109/ICAIIT.2019.8834536.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 2013.
M. M. Truşcă, “Efficiency of SVM classifier with Word2Vec and Doc2Vec models,” Proc. Int. Conf. Appl. Stat., vol. 1, no. 1, pp. 496–503, 2020, doi: 10.2478/icas-2019-0043.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations ofwords and phrases and their compositionality,” 2013.
J. Tang, Y. Wang, K. Zheng, and Q. Mei, “End-to-end learning for short text expansion,” 2017, doi: 10.1145/3097983.3098166.
C. Y. J. Peng, K. L. Lee, and G. M. Ingersoll, “An introduction to logistic regression analysis and reporting,” J. Educ. Res., vol. 96, no. 1, 2002, doi: 10.1080/00220670209598786.
L. Sravani, A. S. Reddy, and S. Thara, “A Comparison Study of Word Embedding for Detecting Named Entities of Code-Mixed Data in Indian Language,” 2018 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2018, pp. 2375–2381, 2018, doi: 10.1109/ICACCI.2018.8554918.
S. Chandra Satapathy and A. Joshi, Smart Innovation, Systems and Technologies 107 Information and Communication Technology for Intelligent Systems, vol. 2. 2018.
U. Rofiqoh, R. S. Perdana, and M. A. Fauzi, “Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexion Based Feature,” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 1, no. 12, pp. 1725–1732, 2017, [Online]. Available: http://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/628.
W. Z. Lu and D. Wang, “Learning machines: Rationale and application in ground-level ozone prediction,” Appl. Soft Comput. J., vol. 24, 2014, doi: 10.1016/j.asoc.2014.07.008.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Hak cipta pada setiap artikel adalah milik penulis.
- Penulis mengakui bahwa Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) sebagai publisher yang mempublikasikan pertama kali dengan lisensi Creative Commons Attribution 4.0 International License.
- Penulis dapat memasukan tulisan secara terpisah, mengatur distribusi non-ekskulif dari naskah yang telah terbit di jurnal ini kedalam versi yang lain (misal: dikirim ke respository institusi penulis, publikasi kedalam buku, dll), dengan mengakui bahwa naskah telah terbit pertama kali pada Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ;