Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion about COVID-19 Vaccine in Indonesia
Abstract
COVID-19 was classified as a pandemic in March 2020, and then in July 2021, this virus had its variance that spreads all over the world including Indonesia. The probability of the detrimental of its effect cannot be avoided, because this virus has a huge transmission risk during daily activity. To prevent suffering from COVID-19, people certainly need to be vaccinated. In responding to its vaccine, the citizen of Indonesia become expressive, so they try to express opinions, for example by uploading text on Twitter. Those expressions can be learned using deep learning frameworks which are BERT, CNN-LSTM, and IndoBERTweet to get knowledge about negative speech categories such as anxiety, panic, and emotion, or positive speech such as vaccines whether worked well. By then, these three methods accomplish in carrying out the prediction of sentiments about vaccination using dataset tweets on Twitter from January-2021 to March-2022, for instance using IndoBERT succeeds to classify sentiments as positive sentiment at around 80%, and then IndoBERTweet at 68%, in addition using CNN-LSTM reach 53% with the total of using 2020 dataset from Twitter. According to these results, a lesson learned for continued improvement for Indonesia's Government or authorities can be acquired in ending the COVID-19 pandemic.
Downloads
References
WHO 2021, WHO Director-General's opening remarks at the media briefing on COVID-19 – 12 July 2021, Available at: https://www.who.int/director-general/speeches/detail/who- director-general-s-opening-remarks-at-the-media-briefing-on-covid-19-12-July-2021.
Y. Liu, A. A. Gayle, A. Wilder-Smith, and J. Rocklöv 2020, ‘‘The reproductive number of COVID-19 is higher compared to SARS coronavirus,’’ J. Travel Med., vol. 27, no. 2, pp. 1–4, Mar. 2020, DOI: 10.1093/jtm/taaa021.
McHugh, M.L., 2012. Interrater reliability: the kappa statistic. Biochemia medica, 22(3), pp.276-282.
Perbawa, I 2021, “Kebijakan Pemerintah Indonesia Dalam Menanggulangi COVID-19 Berdasarkan Instrumen Hukum Internasional”, Jurnal Ilmu Sosial dan Humaniora, Vol. 10, No. 1, April 2021, 199-200.
Koto, F., Lau, J.H. and Baldwin, T., 2021. IndoBERTweet: A ppre-trained language model for Indonesian Twitter with effective domain-specific vocabulary initialization. arXiv preprint arXiv:2109.04607.
Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
M. Khan and A. Malviya, (2020). “Big data approach for sentiment analysis of Twitter data using Hadoop framework and deep learning,” in 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE). IEEE, 2020, pp. 1-5.
C. Arumsari, E. Yulianto, and E. Nur’Afifah, “Sosialisasi Dalam Rangka Memelihara Kesadaran Warga Pada Kesehatan di Masa Pandemi Covid-19”, jurnal pengabdian masyarakat, vol. 2, no. 1, pp. 272-276, Jan. 2021.
Kwok, S., Vadde, S. K., & Wang, G. (2021). Tweet Topics and Sentiments Relating to COVID-19 Vaccination Among Australian Twitter Users: Machine Learning Analysis. Journal of medical Internet research, 23(5), e26953. https://doi.org/10.2196/26953
Lyu, J. C., Han, E. L., & Luli, G. K. (2021). COVID-19 Vaccine-Related Discussion on Twitter: Topic Modeling and Sentiment Analysis. Journal of medical Internet research, 23(6), e24435. https://doi.org/10.2196/24435
Mir, A.A., Rathinam, S., & Gul, S. (2021). Public perception of COVID-19 vaccines from the digital footprints left on Twitter: analyzing positive, neutral, and negative sentiments of Twitterati. Library Hi-Tech.
K. N. Alam et al., “Deep Learning-Based Sentiment Analysis of COVID-19 Vaccination Responses from Twitter Data,” Computational and Mathematical Methods in Medicine, vol. 2021, pp. 1–15, Dec. 2021, DOI: 10.1155/2021/4321131
M. Abdulkareem, N., Mohsin Abdulazeez, A., Qader Zeebaree, D., & A. Hasan, D. (2021). COVID-19 World Vaccination Progress Using Machine Learning Classification Algorithms. DOI 10.48161/Issn.2709-8206
Liew, T. M., & Lee, C. S. (2021). Examining the Utility of Social-Media in COVID-19 Vaccination: Unsupervised Learning of 672,133 Twitter Posts. JMIR public health and surveillance, 7(11), e29789. https://doi.org/10.2196/29789
Koto, F., Rahimi, A., Lau, J.H. and Baldwin, T., 2020. IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP. arXiv preprint arXiv:2011.00677.
Wilie, B., Vincentio, K., Winata, G.I., Cahyawijaya, S., Li, X., Lim, Z.Y., Soleman, S., Mahendra, R., Fung, P., Bahar, S. and Purwarianti, A., 2020. IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. arXiv preprint arXiv:2009.05387.
Zhang, J., Wang, Y., Shi, M., & Wang, X. (2021). Factors Driving the Popularity and Virality of COVID-19 Vaccine Discourse on Twitter: Text Mining and Data Visualization Study. JMIR public health and surveillance, 7(12), e32814. https://doi.org/10.2196/32814
Liew, T. M., & Lee, C. S. (2021). Examining the Utility of Social-Media in COVID-19 Vaccination: Unsupervised Learning of 672,133 Twitter Posts. JMIR public health and surveillance, 7(11), e29789. https://doi.org/10.2196/29789
Liu, S., & Liu, J. (2021). Public attitudes toward COVID-19 vaccines on English-language Twitter: A sentiment analysis. Vaccine, 39(39), 5499–5505. https://doi.org/10.1016/j.vaccine.2021.08.058
Mir, A.A., Rathinam, S., & Gul, S. (2021). Public perception of COVID-19 vaccines from the digital footprints left on Twitter: analyzing positive, neutral, and negative sentiments of Twitterati. Library Hi-Tech.
Purwarianti, A. and I. A. P. A. Crisdayanti, ‘Improving Bi-LSTM Performance for Indonesian Sentiment Analysis Using Paragraph Vector’, Proceedings of the 2019 International Conference of Advanced Informatics: Concepts, Theory, and Applications (ICAICTA), pages 1-5, IEEE 2019.
A. Yenter and A. Verma, "Deep CNN-LSTM with combined kernels from multiple branches for IMDb review sentiment analysis," 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), 2017, pp. 540-546, DOI: 10.1109/UEMCON.2017.8249013.
A. Ristyawati, "Efektifitas Kebijakan Pembatasan Sosial Berskala Besar Dalam Masa Pandemi Corona Virus 2019 oleh Pemerintah Sesuai Amanat UUD NRI Tahun 1945," Administrative Law and Governance Journal, vol. 3, no. 2, pp. 240-249, Jun. 2020. https://doi.org/10.14710/alj.v3i2.240-249
Pristiyono, Ritonga, M., Ihsan, M.A., Anjar, A., & Rambe, F.H. (2021). Sentiment analysis of COVID-19 vaccine in Indonesia using Naïve Bayes Algorithm. IOP Conference Series: Materials Science and Engineering, 1088.
Melton, C.A., Olusanya, O.A., Ammar, N., & Shaban-Nejad, A. (2021). Public sentiment analysis and topic modeling regarding COVID-19 vaccines on the Reddit social media platform: A call to action for strengthening vaccine confidence. Journal of Infection and Public Health, 14, 1505 - 1512. https://doi.org/10.1016/j.jiph.2021.08.010
Puri, N., Coomes, E. A., Haghbayan, H., & Gunaratne, K. (2020). Social media and vaccine hesitancy: new updates for the era of COVID-19 and globalized infectious diseases. Human vaccines & immunotherapeutics, 16(11), 2586–2593. https://doi.org/10.1080/21645515.2020.1780846
Z. Gao, A. Feng, X. Song, and X. Wu, ‘‘Target-dependent sentiment classification with BERT,’’ IEEE Access, vol. 7, pp. 154290–154299, 2019.
M. Ritonga, M. A. Al Ihsan, A. Anjar, and F. H. Rambe, “Sentiment analysis of COVID-19 vaccine in Indonesia using Naïve Bayes Algorithm,” in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1088, no. 1, p. 12045.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv Prepr. arXiv1810.04805, 2018.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;