Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis

  • Naufal Adi Nugroho Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: Sentiment Analysis, SVM, ANN, Word2Vec, TF-IDF



Twitter is a microblog-based social media site launched on July 13, 2006. In March 2020, 476.696 tweets about the government policy in COVID-19 spread on Twitter were captured by the Institute for Development of Economics and Finance (Indef). Government policy has a standard meaning, namely a decision systematically made by the government with specific goals and objectives relating to the public interest, whether carried out directly or indirectly. Sentiment analysis analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. In this decade, Sentiment Analysis is has become a trendy research area. The purpose of this paper is to focus how to implement word2vec using similarity word as a feature expansion for minimize the vocabulary mismatch in Twitter Sentiment Analysis using “word embeddings”. This research contains 11.395 tweets for a dataset, where the dataset will be used in two classifications: Support Vector Machine Algorithm and Artificial Neural Network Algorithm. The output of Word2Vec will be used for feature expansion in this research, where the algorithm of expansion will check in each row in the corpus where has a similarity vector with that word and will replace the word with the similarity of this words if the value is 0. The dataset in Feature Expansion is using 142.545 articles from Indonesian media. The result of this research is ANN is better than SVM, where the ANN without feature expansion gets 68.89 % and using feature expansion gets 72.58 %. For SVM, the final accuracy without feature expansion is 63.95 %, and using feature expansion gets 68.56 %. This research proves that feature expansion can improve the final accuracy.


Download data is not yet available.


A. Fauzi, E. B. Setiawan, and Z. K. A. Baizal, “Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method,” 2019, doi: 10.1088/1742-6596/1192/1/012025.

Herabudin, “Studi kebijakan pemerintah dari filosofi ke implementasi,” Pustaka Setia: Bandung, 2014.

W. Wu, B. Zhang, dan M. Ostendorf, “Automatic Generation of Personalized Annotation Tags for Twitter Users,” Comput. Linguist., no. June, pp. 689–692, 2010.

S. Mukherjee, A. Malu, a R. Balamurali, dan P. Bhattacharyya, “TwiSent : A Multistage System for Analyzing Sentiment,” Cikm 12, pp. 2531–2534, 2012.

M. A. Zingla, L. Chiraz, Y. Slimani, C. Berrut, M. A. Zingla, L. Chiraz, Y. Slimani, dan C. B. Statistical, “Statistical and Semantic Approaches for Tweet Contextualization To cite this version : Statistical and Semantic Approaches for Tweet Contextualization,” Proceeding 19th Int. Conf. Knowl. Based Intell. Inf. Eng. Syst., vol. 60, pp. 498 – 507, 2015.

Setiawan, Erwin B., Dwi H. Widyantoro, and Kridanto Surendro. "Feature expansion using word embedding for tweet topic classification." 2016 10th International Conference on Telecommunication Systems Services and Applications (TSSA). IEEE, 2016.

B. Liu, “Sentiment analysis and opinion mining,” Synth. Lect. Hum. Lang. Technol., 2012, doi: 10.2200/S00416ED1V01Y201204HLT016.

S. Symeonidis, D. Effrosynidis, and A. Arampatzis, “A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis,” Expert Syst. Appl., 2018, doi: 10.1016/j.eswa.2018.06.022.

J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” 2016, doi: 10.21108/indosc.2016.111.

“5th International Conference on Big Data Innovations and Applications, Innovate-Data 2019,” Communications in Computer and Information Science. 2019.

Q. Chen and M. Sokolova, “Unsupervised Sentiment Analysis of Objective Texts,” 2019, doi: 10.1007/978-3-030-18305-9_45.

R. Velvizhi, C. Rajabhushanam, and S. R. S. Vidhya, “Opinion mining for travel route recommendation using Social Media

V. Amrizal, “Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih BukhariMuslim),” J. Tek. Inform., 2018, doi: 10.15408/jti.v11i2.8623.

B. M. Yunus, M. Irfan, W. B. Zulfikar, and W. Darmalaksana, “Similarity detection for hadith of Fiqh of women using cosine similarity and boyer moore method,” Int. J. Adv. Trends Comput. Sci. Eng., 2020, doi: 10.30534/ijatcse/2020/11912020.

M. Negnevitsky, Artificial Intelligence, A Guide to Intelligent Systems (Second Edition). 2015.

A. G. Farizawani, M. Puteh, Y. Marina, and A. Rivaie, “A review of artificial neural network learning rule based on multiple variant of conjugate gradient approaches,” 2020, doi: 10.1088/1742-6596/1529/2/022040.

V. N. Vapnik, “The nature of statistical learning theory. Statistics for Engineering and Information Science,” Springer-Verlag, New York, 2000.

K. Mouthami, K. N. Devi, and V. M. Bhaskaran, “Sentiment analysis and classification based on textual reviews,” 2013, doi:10.1109/ICICES.2013.6508366.

How to Cite
Naufal Adi Nugroho, & Erwin Budi Setiawan. (2021). Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(5), 837 - 842.
Artikel Teknologi Informasi