Implementation of n-gram Methodology to Analyze Sentiment Reviews for Indonesian Chips Purchases in Shopee E-Marketplace
Abstract
Chips are a well-known product among Small and Medium Enterprises (SMEs). In order to enhance the quality of chips as an SME product, sentiment analysis is a crucial step. In this research, sentiment analysis of chip purchases on the Shopee E-marketplace was conducted using the Natural Language Processing (NLP) method, utilizing the N-Gram Model and Term Frequent-Inverse Document Frequency (TF-IDF) as feature extraction techniques, and the Support Vector Machine (SVM) algorithm for sentiment classification. The objective of this research is to identify the most suitable feature extraction model and optimal SVM kernel type from the options of Linear, Polynomial degree, Gaussian RBF, and Sigmoid kernels. Results from the experiments indicate that the TF-IDF and unigram feature extraction techniques offer the best performance for SVM classification when utilizing the Linear kernel. By labeling the dataset, it was observed that using a lexicon-based approach for sentiment classification resulted in 84.31% of the total reviews being positive. The words "price", "cheap" and "quality" in unigram have the highest weights above 0.040. In the unigram model, linear kernel accuracy and precision performance values are 88.4% and 87.3%. At the same time, the recall performance values is 88.4%. The results of the F1-Score assessment matrix from Unigram were 86.9%, Bigram was 78.5% and Trigram was 77.4%. Ultimately, the unigram model combined with a linear kernel in the SVM algorithm demonstrates strong potential for application in the development of various systems focused on detecting user reviews in the Indonesian language on the Shopee E-Marketplace.
Downloads
References
W. laura Hardilawati, “Strategi Bertahan UMKM di Tengah Pandemi Covid-19,” Jurnal Akuntansi dan Ekonomika, vol. 10, no. 1, pp. 89–98, 2020, doi: 10.37859/jae.v10i1.1934.
D. A. I. dan E. Y. Devin Ananda D. S, “Kajian Strategi Pengembangan UMKM Dalam Menghadapi Era Digital ( Studi Kasus UMKM Keripik Apel Delicious Kota Batu ),” vol. 1, no. 1, pp. 19–27, 2021.
S. Sandri and W. laura Hardilawati, “Model Pemasaran Hubungan Pelanggan, Inovasi Dan E-Commerce Dalam Meningkatkan Kinerja Pemasaran Ukm Di Pekanbaru,” Jurnal Akuntansi dan Ekonomika, vol. 2, no. 1, pp. 20–42, 2019.
D. Setyorini, E. Nurhayaty, and R. Rosmita, “PENGARUH TRANSAKSI ONLINE (e-Commerce) TERHADAP PENINGKATAN LABA UMKM (Studi Kasus UMKM Pengolahan Besi Ciampea Bogor Jawa Barat),” Jurnal Mitra Manajemen, vol. 3, no. 5, pp. 501–509, May 2019, doi: 10.52160/ejmm.v3i5.228.
E. H. Muktafin, K. Kusrini, and E. T. Luthfi, “Analisis Sentimen pada Ulasan Pembelian Produk di Marketplace Shopee Menggunakan Pendekatan Natural Language Processing,” Jurnal Eksplora Informatika, vol. 10, no. 1, pp. 32–42, 2020, doi: 10.30864/eksplora.v10i1.390.
P. Gentsch, AI Business: Framework and Maturity Model. 2019. doi: 10.1007/978-3-319-89957-2_3.
R. Kibble, “Introduction to natural language processing Undergraduate study in Computing and related programmes,” Roeper Rev, vol. 1, no. 2, p. 26, 2013.
L. T. Vo, Mining Social Media - Finding Stories in Internet Data. Wlliam Pollock, 2020.
B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002. pp. 79–86, 2002.
Siswanto, Y. P. Wibawa, W. Gata, G. Gata, and N. Kusumawardhani, “Classification Analysis of MotoGP Comments on Media Social Twitter Using Algorithm Support Vector Machine and Naive Bayes,” in 2018 International Conference on Applied Information Technology and Innovation (ICAITI), 2018, pp. 96–101. doi: 10.1109/ICAITI.2018.8686751.
A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and SVM Algorithm based on Sentiment Analysis Using Review Dataset,” Proceedings of the 2019 8th International Conference on System Modeling and Advancement in Research Trends, SMART 2019, pp. 266–270, 2020, doi: 10.1109/SMART46866.2019.9117512.
M. B. Sitepu, I. R. Munthe, and S. Z. Harahap, “Implementation of Support Vector Machine Algorithm for Shopee Customer Sentiment Anlysis,” Sinkron, vol. 7, no. 2, pp. 619–627, 2022, doi: 10.33395/sinkron.v7i2.11408.
H. Xu and Y. Lv, “Mining and Application of Tourism Online Review Text Based on Natural Language Processing and Text Classification Technology,” Wirel Commun Mob Comput, vol. 2022, 2022, doi: 10.1155/2022/9905114.
Y. M. Febrianti, I. Indriati, and A. W. Widodo, “Analisis Sentimen Pada Ulasan ‘Lazada ’ Berbahasa Indonesia Menggunakan K-Nearest Neighbor ( K-NN ) Dengan Perbaikan Kata Menggunakan Jaro Winkler Distance,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 10, pp. 3689–3698, 2018.
S. Yuan, A. Pratiwi, and S. R. Nudin, “Analisis Sentimen terhadap Facebook Marketplace Menggunakan Metode Lexicon Based dan Support Vector Machine,” vol. 3, pp. 9–15, 2021.
B. Pamungkas, M. E. Purbaya, and D. J. A.K, “Analisis Sentimen Twitter Menggunakan Metode Support Vector Machine (SVM) pada Kasus Benih Lobster 2020,” Journal of Informatics, Information System, Software Engineering and Applications (INISTA), vol. 3, no. 2, pp. 10–20, 2021, doi: 10.20895/inista.v3i2.243.
M. Harahap, B. P. A. Sihombing, O. A. F. Laia, B. T. Saragih, and K. Dharma, “Analisis Sentimen Review Penjualan Produk Umkm Pada Kabupaten Nias Dengan Komparasi Algoritma Klasifikasi Machine Learning,” METHOMIKA Jurnal Manajemen Informatika dan Komputerisasi Akuntansi, vol. 5, no. 2, pp. 147–154, 2021, doi: 10.46880/jmika.vol5no2.pp147-154.
A. A. Lutfi, A. E. Permanasari, and S. Fauziati, “Sentiment Analysis in the Sales Review of Indonesian Marketplace by Utilizing Support Vector Machine,” Journal of Information Systems Engineering and Business Intelligence, vol. 4, no. 2, p. 169, 2018, doi: 10.20473/jisebi.4.2.169.
A. F. Rochim, K. Widyaningrum, and D. Eridani, “Comparison of Linear , Radial Base Function , and Polynomial Kernel Function Support Vector Machine Method Towards COVID-19 Sentiment Analysis”.
A. Nurkholis, D. Alita, and A. Munandar, “Comparison of Kernel Support Vector Machine Multi-Class in PPKM Sentiment Analysis on Twitter,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 2, pp. 227–233, 2022, doi: 10.29207/resti.v6i2.3906.
B. P. Zen, I. Susanto, and D. Finaliamartha, “TF-IDF Method and Vector Space Model Regarding the Covid-19 Vaccine on Online News,” SinkrOn, vol. 6, no. 1, pp. 69–79, 2021, doi: 10.33395/sinkron.v6i1.11179.
P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,” 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy, ICDABI 2020, 2020, doi: 10.1109/ICDABI51230.2020.9325685.
M. E. Purbaya, D. P. Rakhmadani, M. P. Arum, and L. Z. Nasifah, “Comparison of Kernel Support Vector Machines in Conducting Sentiment Analysis Review of Buying Chips on the Shopee E- Marketplace in Indonesian,” in 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2022, pp. 435–440. doi: 10.1109/ICIMCIS56303.2022.10017546.
N. Nasser, L. Karim, A. el Ouadrhiri, A. Ali, and N. Khan, “n-Gram based language processing using Twitter dataset to identify COVID-19 patients,” Sustain Cities Soc, vol. 72, Sep. 2021, doi: 10.1016/j.scs.2021.103048.
M. Mether, “The history of the central limit theorem,” Sovelletun Matematiikan erikoistyöt, vol. 2, no. 1, p. 08, 2003, doi: 10.1007/978-0-387-87857-7.
A. S. Widagdo, B. S. W.A, and A. Nasiri, “Analisis Tingkat Kepopuleran E-Commerce Di Indonesia Berdasarkan Sentimen Sosial Media Menggunakan Metode Naïve Bayes,” Jurnal Informa : Jurnal Penelitian dan Pengabdian Masyarakat, vol. 6, no. 1, pp. 1–5, 2020, doi: 10.46808/informa.v6i1.159.
S. I. Nurhafida and F. Sembiring, “Analisis Text Clustering Masyarakat di Twiter Mengenai Mcdonald’sxbts Menggunakan Orange Data Mining,” Seminar Nasional Sistem Informasi dan Manajemen Informatika, vol. 1, no. 1, pp. 28–35, 2021.
Imamah and F. H. Rachman, “Twitter sentiment analysis of Covid-19 using term weighting TF-IDF and logistic regresion,” Proceeding - 6th Information Technology International Seminar, ITIS 2020, pp. 238–242, 2020, doi: 10.1109/ITIS50118.2020.9320958.
H. E. Wynne and Z. Z. Wint, “Content based fake news detection using N-gram models,” ACM International Conference Proceeding Series, 2019, doi: 10.1145/3366030.3366116.
L. Mutawalli, M. T. A. Zaen, and W. Bagye, “Klasifikasi Teks Sosial Media Twitter Menggunakan Support Vector Machine (Studi Kasus Penusukan Wiranto),” Jurnal Informatika dan Rekayasa Elektronik, vol. 2, no. 2, p. 43, 2019, doi: 10.36595/jire.v2i2.117.
A. S. Nugroho, A. Budi Witarto, and D. Handoko, “Support Vector Machine,” IlmuKomputer.com. p. 11, 2003. doi: 10.1109/CCDC.2011.5968300.
D. Berrar, “Cross-Validation,” in Encyclopedia of Bioinformatics and Computational Biology - Volume 1, S. Ranganathan, M. Gribskov, K. Nakai, and C. Schönbach, Eds. Elsevier, 2019, pp. 542–545. doi: 10.1016/b978-0-12-809633-8.20349-x.
M. Awad and R. Khanna, Efficient Learning Machines. Berkeley, CA: Apress, 2015. doi: 10.1007/978-1-4302-5990-9.
F. Koto and G. Y. Rahmaningtyas, “Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs,” Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017, vol. 2018-Janua, no. December, pp. 391–394, 2018, doi: 10.1109/IALP.2017.8300625
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;