Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis
Indonesian General Analysis Dataset is a dataset sourced from social media twitter by using keywords in the form of conjunctions to get a dataset that does not only focus on a particular topic. The use of Indonesian language datasets with general topics can be used to test the accuracy of the classification model so as to provide additional reference in choosing the right methods and parameters for sentiment analysis. One of the algorithms which in several studies produces the highest level of accuracy is naive Bayes which has several variations. This study aims to obtain the method with the best accuracy from the naive Bayes variation by setting the minimum and maximum document frequency parameters on the Indonesian General Analysis Dataset for sentiment analysis. The naive Bayes classifier variations used include Bernoulli naive Bayes, gaussian naive Bayes, complement naive Bayes and multinomial naive Bayes. The research stage begins with downloading the dataset. Preprocessing becomes the next stage which consists of tokenizing, stemming, converting abbreviations and eliminating conjunctions. In the preprocessed data, feature extraction is carried out by converting the dataset into vectors and applying the TF-IDF method before entering the sentiment analysis classification stage. Tests in this study were carried out by applying the minimum document frequency (min-df) and maximum document frequency (max-df) for each variation of naive Bayes to obtain the appropriate parameters. The test uses k-fold cross validation of the dataset to divide the training data and sentiment analysis test data. The next confusion matrix is made to evaluate the level of accuracy.
CA Iglesias and A. Moreno, “Sentiment Analysis for Social Media,”Applied Sciences 2019, Vol. 9, Page 5037, vol. 9, no. 23, p. 5037, Nov. 2019, doi:10.3390/APP9235037.
A. Mittal and S. Patidar, “Sentiment Analysis on Twitter Data: A Survey,”Proceedings of the 2019 7th International Conference on Computer and Communications Management, 2019, doi:10.1145/3348445.
S. Pandya and P. Mehta, “A ReviewOn Sentiment Analysis Methodologies, Practices And Applications Dog Acoustic Analysis View project Pollution Monitoring System View project A Review On Sentiment Analysis Methodologies, Practices And Applications,” International Journal Of Scientific & Technology Research, vol. 9, p. 2, 2020.
Samsir, Kusmanto, Abdul Hakim Dalimunthe, Rahmad Aditiya, and Ronal Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 1–6, Jun. 2022
I. Odun-Ayo, R. Goddy-Worlu, L. Ajayi, al -, R. Baragash, and H. Aldowah, “Sentiment analysis in higher education: a systematic mapping review,”Journal of Physics: Conference Series, vol. 1860, no. 1, p. 012002, Mar. 2021, doi: 10.1088/1742-6596/1860/1/012002.
R. Ardianto, T. Rivanie, Y. Alkhalifi, FS Nugraha, and W. Gata, “Sentiment Analysis On e-SportsFor Education Curriculum Using Naive Bayes And Support Vector Machine,” Journal of Computer and Information Science, vol. 13, no. 2, pp. 109–122, Jul. 2020, doi:10.21609/JIKI.V13I2.885.
H. S. Batubara, Ambiyar, Syahril, Fadhilah, and R. Watrianthos, “Sentiment Analysis of Face-To-Face Learning Based on Social Media,” Jurnal Pendidikan Teknologi Kejuruan, vol. 4, no. 3, pp. 102–106, 2021
S. Dyah Anggita and Ikmah, “Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services,”RESTI Journal (System Engineering and Information Technology), vol. 4, no. 2, pp. 362–369, Apr. 2020, doi:10.29207/RESTI.V4I2.1840.
S. Tamrakar, BK Bal, and RB Thapa, “Aspect Based Sentiment Analysis of Nepali Text Using Support Vector Machine and Naive Bayes,”Technical Journal, vol. 2, no. 1, pp. 22–29, Nov. 2020, doi:10.3126/TJ.V2I1.32824.
S. Xu, “Bayesian Naïve Bayes classifiers to textclassification:," https://doi.org/10.1177/0165551516677946, vol. 44, no. 1, pp. 48–59, Nov. 2016, doi:10.1177/0165551516677946.
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383.
CH Yutika, A. Adiwijaya, and S. al Faraby, "Aspect-Based Sentiment Analysis on Female Daily Review Using TF-IDF and Naïve Bayes,"Journal of Media Informatics BUDIDARMA, vol. 5, no. 2, pp. 422–430, Apr. 2021, doi:10.30865/MIB.V5I2.2845.
M. Adnan Nur, "Comparison of Levenshtein Distance and Jaro-Winkler Distance for Word Correction in Preprocessing Twitter User Sentiment Analysis,"Journal of Electrode Focus: Electrical Energy, Telecommunications, Computers, Electronics and Controls), vol. 6, no. 2, pp. 88–93, Jun. 2021, doi:10.33772/JFE.V6I2.17751.
Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMAJURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293
R. Ferdiana, F. Jatmiko, DD Purwanti, A. Sekar, T. Ayu, and WF Dicka, “Indonesian Datasets for Sentiment Analysis,”National Journal of Electrical Engineering and Information Technology (JNTETI), vol. 8, no. 4, pp. 334–339, Nov. 2019, doi:10.22146/JNTETI.V8I4.533.
J. Resti and F. Selva Jumeilah, “Implementation of Support Vector Machine (SVM)for Research Categorization,” RESTI Journal (System Engineering and Information Technology), vol. 1, no. 1, pp. 19–25, Jul. 2017, doi:10.29207/RESTI.V1I1.11.
N. Saputra, TB Adji, and AE Permanasari, "Analysis of President Jokowi's Data Sentiment with Normalization and Stemming Preprocessing Using Naive Bayes and SVM methods,"Journal of Informatics Dynamics, vol. 5, no. 1, 2015.
D. Wahyudi, T. Susyanto, D. Nugroho, P. Informatics Engineering Studies, S. Sinar Nusantara Surakarta, and P. Information Systems Studies, "Implementation and Analysis of Nazief & Adriani Dan Porter Stemming Algorithms in Indonesian Language Documents,"SINUS Scientific Journal, vol. 15, no. 2, Jul. 2017, doi:10.30646/SINUS.V15I2.305.
A. Rahmatullahet al., “Comparison between the Stemmer Porter Effect and Nazief-Adriani on the Performance of Winnowing Algorithms for Measuring Plagiarism ,” Article in International Journal on Advanced Science Engineering and Information Technology, 2019, doi: 10.18517/ijaseit.9.4.8844.
J. Jumadi, DS Maylawati, LD Pratiwi, and MA Ramdhani, “Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process,”IOP Conference Series: Materials Science and Engineering, vol. 1098, no. 3, p. 032044, Mar. 2021, doi:10.1088/1757-899X/1098/3/032044.
AF Hidayatullah, “The Effect of Stopwords on Tweet Classification Performancein Indonesian,” JISKA (Journal of Informatics Sunan Kalijaga), vol. 1, no. 1, pp. 1–4, May 2016, doi: 10.14421/jiska.2016.11-01.
M. Priandi and Painem, "Analysis of Community Sentiment Against Online Learning in the Era of the Covid-19 Pandemic on Twitter Social Media Using Countvectorizer Feature Extraction and the K-Nearest Neighbor Algorithm,"National Seminar on Computer Science Students and Its Applications (SENAMIKA) Jakarta-Indonesia, pp. 311–319, 2021.
A. Turmudi and K. Syarief Yasah, "Indonesian Tweet Sentiment Analysis Using Extraction Features and Cross Validation Techniques Against Naive Bayes Models,"SIGMA Information Technology Journal, vol. 10, no. 4, pp. 2407–3903, 2020.
SA Pratomo, S. al Faraby, and MD Purbolaksono, "Sentiment Analysis of the Effect of Combination of TF-IDF and Lexicon Feature Extraction on Film Reviews Using the KNN Method," ineProceedings of Engineering, 2021, pp. 10116–10126.
S. Fransiska, R. Rianto, and AI Gufroni, “Sentiment Analysis ProviderBy.U on Google Play Store Reviews with TF-IDF and Support Vector Machine (SVM) Method,” Scientific Journal of Informatics, vol. 7, no. 2, pp. 203–212, Nov. 2020, doi: 10.15294/SJI.V7I2.25596.
K. Gde Sukarsa and I. Gusti Ayu Made Srinadi, “Discriminant Analysis on Classification of Villages in KabupatenTabanan Using the K-Fold Cross Validation Method,” E-Jurnal Mathematics, vol. 6, no. 2, pp. 106–115, 2017, doi:10.24843/MTK.2017.v06.i02.p154.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;