Word2Vec on Sentiment Analysis with Synthetic Minority Oversampling Technique and Boosting Algorithm

Rayhan Rahmanda; Erwin Budi Setiawan

doi:10.29207/resti.v6i4.4186

Rayhan Rahmanda Telkom University
Erwin Budi Setiawan Telkom University

DOI: https://doi.org/10.29207/resti.v6i4.4186

Keywords: sentiment analysis, logistic regression, word2vec, twitter

Abstract

Customer opinion is an important aspect in determining the success of a company or service provider. By determining the sentiment of the existing opinion, the company can use it as an evaluation material to improve the quality of the service or product provided. Sentiment analysis can be used as a measure of opinion sentiment with input data in the form of a corpus which will be classified into positive or negative classes to obtain the level of customer satisfaction with a product or service. Aspect-based sentiment analysis can be used by companies to analyze more specifically and find out what aspects need to be improved. In this research, an aspect-based sentiment analysis was conducted on Telkomsel users on Twitter. The data used is 16,992 tweets from users who discuss several aspects such as Telkomsel's services and signals in Twitter. In this research Word2Vec was used for feature expansion to minimize vocabulary mismatch caused by limited words in tweets. The results showed that Word2Vec, Synthetic Minority Oversampling Technique (SMOTE), and Boosting algorithm combination with Logistic Regression classifier achieve highest accuracy of 95.10% for signal aspect and using hyperparameters makes the service aspect get the highest accuracy of 93.34%.

Downloads

Download data is not yet available.

References

Y. Cahyono, “Analisis Sentiment Pada Sosial Media Twitter Menggunakan Naïve Bayes Classifier dengan Feature Selection Particle Swarm Optimization dan Term Frequency,” JURNAL INFORMATIKA UNIVERSITAS PAMULANG, vol. 14, no. 1, 2017.

J. Acosta, N. Lamaute, M. Luo, E. Finkelstein, and A. Cotoranu, “Sentiment Analysis of Twitter Messages Using Word2Vec,” 2017.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” in Procedia Computer Science, 2019, vol. 152, pp. 341–348. doi: 10.1016/j.procs.2019.05.008.

M. A. Fauzi, “Word2Vec model for sentiment analysis of product reviews in Indonesian language,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 1, p. 525, Feb. 2019, doi: 10.11591/ijece.v9i1.pp525-530.

P. Baid, A. Gupta, and N. Chaplot, “Sentiment Analysis of Movie Reviews using Machine Learning Techniques,” International Journal of Computer Applications, vol. 179, no. 7, pp. 45–49, Dec. 2017, doi: 10.5120/ijca2017916005.

Samsir, Kusmanto, Abdul Hakim Dalimunthe, Rahmad Aditiya, and Ronal Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 1–6, Jun. 2022.

L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin, “A survey of sentiment analysis in social media,” Knowledge and Information Systems, vol. 60, no. 2, pp. 617–663, Aug. 2019, doi: 10.1007/s10115-018-1236-4.

Q. Jiang, L. Chen, R. Xu, X. Ao, and M. Yang, “A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis,” 2019. [Online]. Available: https://github.com/siat-

R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion for sentiment analysis in twitter,” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Oct. 2018, vol. 2018-October, pp. 509–513. doi: 10.1109/EECSI.2018.8752851.

T. Sarkar and N. Rajadhyaksha, “TLA: Twitter Linguistic Analysis,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2107.09710

S. Elbagir and J. Yang, “Sentiment analysis of twitter data using machine learning techniques and scikit-learn,” Dec. 2018. doi: 10.1145/3302425.3302492.

H. S. Batubara, Ambiyar, Syahril, Fadhilah, and R. Watrianthos, “Sentiment Analysis of Face-To-Face Learning Based on Social Media,” Jurnal Pendidikan Teknologi Kejuruan, vol. 4, no. 3, pp. 102–106, 2021.

Naufal Adi Nugroho and Erwin Budi Setiawan, “Implementation Word2Vec for Feature Expansion in Twitter Sentiment Analysis,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 5, pp. 837–842, Oct. 2021, doi: 10.29207/resti.v5i5.3325.

R. I. Kurnia, Y. D. Tangkuman, and A. S. Girsang, “Classification of user comment using word2vec and SVM classifier,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 1, pp. 643–648, 2020, doi: 10.30534/ijatcse/2020/90912020.

R. P. Nawangsari, R. Kusumaningrum, and A. Wibowo, “Word2vec for Indonesian sentiment analysis towards hotel reviews: An evaluation study,” in Procedia Computer Science, 2019, vol. 157, pp. 360–366. doi: 10.1016/j.procs.2019.08.178.

A. Poornima and K. Sathiya Priya, A Comparative Sentiment Analysis Of Sentence Embedding Using Machine Learning Techniques. 2020.

S. Guo, Y. Liu, R. Chen, X. Sun, and X. Wang, “Improved SMOTE Algorithm to Deal with Imbalanced Activity Classes in Smart Homes,” Neural Processing Letters, vol. 50, no. 2, pp. 1503–1526, Oct. 2019, doi: 10.1007/s11063-018-9940-3.

J. Tanha, “A multiclass boosting algorithm to labeled and unlabeled data,” International Journal of Machine Learning and Cybernetics, vol. 10, no. 12, pp. 3647–3665, Dec. 2019, doi: 10.1007/s13042-019-00951-4.

A. Filcha and M. Hayaty, “Implementasi Algoritma Rabin-Karp untuk Pendeteksi Plagiarisme pada Dokumen Tugas Mahasiswa (Rabin-Karp Algorithm Implementation to Detect Plagiarism on Student’s Assignment Document),” 2019.

M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” Aug. 2020, [Online]. Available: http://arxiv.org/abs/2008.05756

Z. DeVries et al., “Using a national surgical database to predict complications following posterior lumbar surgery and comparing the area under the curve and F1-score for the assessment of prognostic capability,” Spine Journal, vol. 21, no. 7, pp. 1135–1142, Jul. 2021, doi: 10.1016/j.spinee.2021.02.007.

E. B. Setiawan, D. H. Widyantoro, and K. Surendro, “Feature expansion using word embedding for tweet topic classification,” Mar. 2017. doi: 10.1109/TSSA.2016.7871085.

Word2Vec on Sentiment Analysis with Synthetic Minority Oversampling Technique and Boosting Algorithm

Abstract

Downloads

References

Most read articles by the same author(s)