Sentiment Analysis on KAI Twitter Post Using Multiclass Support Vector Machine (SVM)
Abstract
Information in form of unstructured texts is increasing and becoming commonplace for its existence on the internet. This information is easily found and utilized by business people or companies through social media. One of them is Twitter. Twitter is ranked 6th as a social media that is widely accessed today. The use of Twitter has the disadvantage of unstructured and large data. Consequently, it is difficult for business people or companies to know opinion towards service with limited resources. To Make it easier for businesses know the public's sentiment for better service in the future, public sentiment on Twitter needs to be classified as positive, neutral, and negative. The Multiclass Support Vector Machine (SVM) method is a supervised learning classification method that handles three classes classification. This paper uses One Against All (OAA) approach as a method to determine the class. This paper contains the results of classifying OAA Multiclass SVM methods with five different weighting features unigram, bigram, trigram, unigram+ bigram, and word cloud for analyzing tweet data, finding the best accuracy and important feature when processed with large data. The highest accuracy is the unigram TF-IDF model combined with the OAA Multiclass SVM with gamma 0.7 is 80.59.
Downloads
References
shows the visualization of the accuracy value with the gamma value variable.
Table 13 shows the configuration matrix of the best accuracy. The best accuracy is obtained from testing the Multiclass Support Vector Machine (SVM) method with 0.7 gamma parameters and Unigram's TF-IDF weighting. Based on Table 13, it can be seen that 35 data are classified as correctly positive, 365 data are correctly classified as neutral, and 166 data are correctly classified as negative.
Conclusion
Millions of Twitter users post their opinions on their tweets. Business can use this information to their advantages, but it takes a lot of time. Therefore, there is a need of sentiment analysis that predicted tweet sentiment with TF-IDF and machine learning method. Based on applying five different TF-IDF feature weighting approaches with the Multiclass Support Vector Machine (SVM) method to classify @KAI121 account tweet data for sentiment analysis, researchers can conclude that The highest accuracy results obtained using the SVM OAA multiclass method in analyzing sentiments were obtained at a ratio of 90:10 using the unigram scheme, TF-IDF weighting, and gamma 0.7 parameter values, which amounted to 80.59. An important feature in this study is the unigram feature because it represents a unique feature and produces a high accuracy value. The gamma used can affect the classification results, the smaller the gamma value used, the accuracy tends to increase. Based on the research results, account @kai121 has 11% positive sentiment, 58% neutral sentiment, and 31% negative sentiment. PT. KAI is expected to improve its services to users of railroad transportation services due to the positive sentiment rate which is the value of train user satisfaction is still below average.
Based on the information above, there are some suggestions for further research. It is recommended to do manual labeling to many linguists so that the data used is more valid. In further research, to improve the accuracy of testing can be done by adding the amount of the previous data set and adding vocabulary to the list of normalization so that the dataset is even more balanced than before. Conduct sentiment analysis using different classification methods and weighting features.
References
Number of internet users in Indonesia 2023 | Statista.” [Online]. Available: https://www.statista.com/statistics/254456/number-of-internet-users-in-indonesia/. [Accessed: 17-Sep-2019].
“Indonesia Digital 2019 : Media Sosial - Websindo.” [Online]. Available: https://websindo.com/indonesia-digital-2019-media-sosial/. [Accessed: 17-Sep-2019].
I. P. Windasari, F. N. Uzzi, and K. I. Satoto, “Sentiment analysis on Twitter posts: An analysis of positive or negative opinion on GoJek,” Proc. - 2017 4th Int. Conf. Inf. Technol. Comput. Electr. Eng. ICITACEE 2017, vol. 2018-Janua, pp. 266–269, 2018.
G. A. Dalaorao, A. M. Sison, and R. P. Medina, “Integrating Collocation as TF-IDF Enhancement to Improve Classification Accuracy,” TSSA 2019 - 13th Int. Conf. Telecommun. Syst. Serv. Appl. Proc., pp. 282–285, 2019.
M. L. Pratama, “Studi Komparasi Metode Multiclass Support Vector Machine Untuk Masalah Analisis Sentimen Pada Twitter,” Fmipa Ui, 2014.
A. Mustakim, I. Santoso, and A. A. Zahra, “Pengenalan Ekspresi Wajah Manusia Menggunakan Tapis Gabor 2-D Dan Support Vector Machine (Svm),” Transient, vol. 6, no. 3, p. 232, 2017.
D. De Clercq, Z. Wen, and Q. Song, “Innovation hotspots in food waste treatment, biogas, and anaerobic digestion technology: A natural language processing approach,” Sci. Total Environ., vol. 673, pp. 402–413, 2019.
F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word cloud explorer: Text analytics based on word clouds,” Proc. Annu. Hawaii Int. Conf. Syst. Sci., pp. 1833–1842, 2014.
A. M. Pravina, I. Cholissodin, and P. P. Adikara, “Analisis Sentimen Tentang Opini Maskapai Penerbangan pada Dokumen Twitter Menggunakan Algoritme Support Vector Machine ( SVM ),” J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, vol. 3, no. 3, pp. 2789–2797, 2019.
M. Allahyari et al., “A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques,” 2017.
H. Liang, X. Sun, Y. Sun, and Y. Gao, “Text feature extraction based on deep learning: a review,” Eurasip J. Wirel. Commun. Netw., vol. 2017, no. 1, pp. 1–12, 2017.
S. M. H. Dadgar, M. S. Araghi, and M. M. Farahani, “A novel text mining approach based on TF-IDF and support vector machine for news classification,” Proc. 2nd IEEE Int. Conf. Eng. Technol. ICETECH 2016, no. March, pp. 112–116, 2016.
“Digital library - Perpustakaan Pusat Unikom - Knowledge Center - WELCOME | Powered by GDL4.2 | ELIB UNIKOM.” [Online]. Available: https://elib.unikom.ac.id/gdl.php?mod=browse&op=read&id=jbptunikompp-gdl-citrawatii-35966&newtheme=gray&newtheme=green. [Accessed: 15-Dec-2019].
“Mengukur Kinerja Algoritma Klasifikasi dengan Confusion Matrix – Achmatim.Net.” [Online]. Available: https://achmatim.net/2017/03/19/mengukur-kinerja-algoritma-klasifikasi-dengan-confusion-matrix/. [Accessed: 13-Nov-2019].
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;