Word2Vec on Sentiment Analysis with Synthetic Minority Oversampling Technique and Boosting Algorithm
Abstract
Customer opinion is an important aspect in determining the success of a company or service provider. By determining the sentiment of the existing opinion, the company can use it as an evaluation material to improve the quality of the service or product provided. Sentiment analysis can be used as a measure of opinion sentiment with input data in the form of a corpus which will be classified into positive or negative classes to obtain the level of customer satisfaction with a product or service. Aspect-based sentiment analysis can be used by companies to analyze more specifically and find out what aspects need to be improved. In this research, an aspect-based sentiment analysis was conducted on Telkomsel users on Twitter. The data used is 16,992 tweets from users who discuss several aspects such as Telkomsel's services and signals in Twitter. In this research Word2Vec was used for feature expansion to minimize vocabulary mismatch caused by limited words in tweets. The results showed that Word2Vec, Synthetic Minority Oversampling Technique (SMOTE), and Boosting algorithm combination with Logistic Regression classifier achieve highest accuracy of 95.10% for signal aspect and using hyperparameters makes the service aspect get the highest accuracy of 93.34%.
Downloads
References
Y. Cahyono, “Analisis Sentiment Pada Sosial Media Twitter Menggunakan Naïve Bayes Classifier dengan Feature Selection Particle Swarm Optimization dan Term Frequency,” JURNAL INFORMATIKA UNIVERSITAS PAMULANG, vol. 14, no. 1, 2017.
J. Acosta, N. Lamaute, M. Luo, E. Finkelstein, and A. Cotoranu, “Sentiment Analysis of Twitter Messages Using Word2Vec,” 2017.
R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” in Procedia Computer Science, 2019, vol. 152, pp. 341–348. doi: 10.1016/j.procs.2019.05.008.
M. A. Fauzi, “Word2Vec model for sentiment analysis of product reviews in Indonesian language,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 1, p. 525, Feb. 2019, doi: 10.11591/ijece.v9i1.pp525-530.
P. Baid, A. Gupta, and N. Chaplot, “Sentiment Analysis of Movie Reviews using Machine Learning Techniques,” International Journal of Computer Applications, vol. 179, no. 7, pp. 45–49, Dec. 2017, doi: 10.5120/ijca2017916005.
Samsir, Kusmanto, Abdul Hakim Dalimunthe, Rahmad Aditiya, and Ronal Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 1–6, Jun. 2022.
L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment analysis in social media,” Knowledge and Information Systems, vol. 60, no. 2, pp. 617–663, Aug. 2019, doi: 10.1007/s10115-018-1236-4.
Q. Jiang, L. Chen, R. Xu, X. Ao, and M. Yang, “A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis,” 2019. [Online]. Available: https://github.com/siat-
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Oct. 2018, vol. 2018-October, pp. 509–513. doi: 10.1109/EECSI.2018.8752851.
T. Sarkar and N. Rajadhyaksha, “TLA: Twitter Linguistic Analysis,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2107.09710
S. Elbagir and J. Yang, “Sentiment analysis of twitter data using machine learning techniques and scikit-learn,” Dec. 2018. doi: 10.1145/3302425.3302492.
H. S. Batubara, Ambiyar, Syahril, Fadhilah, and R. Watrianthos, “Sentiment Analysis of Face-To-Face Learning Based on Social Media,” Jurnal Pendidikan Teknologi Kejuruan, vol. 4, no. 3, pp. 102–106, 2021.
Naufal Adi Nugroho and Erwin Budi Setiawan, “Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 5, pp. 837–842, Oct. 2021, doi: 10.29207/resti.v5i5.3325.
R. I. Kurnia, Y. D. Tangkuman, and A. S. Girsang, “Classification of user comment using word2vec and SVM classifier,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 1, pp. 643–648, 2020, doi: 10.30534/ijatcse/2020/90912020.
R. P. Nawangsari, R. Kusumaningrum, and A. Wibowo, “Word2vec for Indonesian sentiment analysis towards hotel reviews: An evaluation study,” in Procedia Computer Science, 2019, vol. 157, pp. 360–366. doi: 10.1016/j.procs.2019.08.178.
A. Poornima and K. Sathiya Priya, A Comparative Sentiment Analysis Of Sentence Embedding Using Machine Learning Techniques. 2020.
S. Guo, Y. Liu, R. Chen, X. Sun, and X. Wang, “Improved SMOTE Algorithm to Deal with Imbalanced Activity Classes in Smart Homes,” Neural Processing Letters, vol. 50, no. 2, pp. 1503–1526, Oct. 2019, doi: 10.1007/s11063-018-9940-3.
J. Tanha, “A multiclass boosting algorithm to labeled and unlabeled data,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 12, pp. 3647–3665, Dec. 2019, doi: 10.1007/s13042-019-00951-4.
A. Filcha and M. Hayaty, “Implementasi Algoritma Rabin-Karp untuk Pendeteksi Plagiarisme pada Dokumen Tugas Mahasiswa (Rabin-Karp Algorithm Implementation to Detect Plagiarism on Student’s Assignment Document),” 2019.
M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020, [Online]. Available: http://arxiv.org/abs/2008.05756
Z. DeVries et al., “Using a national surgical database to predict complications following posterior lumbar surgery and comparing the area under the curve and F1-score for the assessment of prognostic capability,” Spine Journal, vol. 21, no. 7, pp. 1135–1142, Jul. 2021, doi: 10.1016/j.spinee.2021.02.007.
E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,” Mar. 2017. doi: 10.1109/TSSA.2016.7871085.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;