Comparison of LSTM and IndoBERT Method in Identifying Hoax on Twitter
Abstract
In recent years, social media users have been increasing significantly, in January 2022 social media users in Indonesia reached 191 million people which has an increase of 12.35% from the previous year as many as 170 million people, With this massive increase every year, more and more people tend to seek and consume information through social media. Despite the many advantages provided by social media, However, the quality of information on social media is lower than in traditional news media there is a lot of hoax information spreading. With many disadvantages felt by hoax information, it has led to many research to detect hoax information on social media, especially information that is widely spread on Twitter. There are several previous researches that use various models using machine learning and also using deep learning to detect hoax. deep learning is very well used to perform several text classification tasks, especially in detecting hoax. The aim of this paper is to compare the LSTM and IndoBERT methods in detecting hoax using datasets taken from Twitter. In this study, two experiments work are conducted, LSTM and IndoBERT methods. The experimental results is average value obtained from experiments using 10-fold cross-validation. The IndoBERT model shows good performance with an average accuracy value of 92.07%, and the LSTM model provides an average accuracy value of 87.54%. The IndoBERT model can show good performance in hoax detection tasks and is shown to outperform the LSTM model which can provide the best average accuracy results in this study.
Downloads
References
“DIGITAL 2022: ANOTHER YEAR OF BUMPER GROWTH - We Are Social UK.” https://wearesocial.com/uk/blog/2022/01/digital-2022-another-year-of-bumper-growth-2/ (accessed May 26, 2022).
K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake News Detection on Social Media: A Data Mining Perspective,” Aug. 2017, doi: 10.48550/arxiv.1708.01967.
S. Girgis, E. Amer, and M. Gadallah, “Deep Learning Algorithms for Detecting Fake News in Online Text,” Proceedings - 2018 13th International Conference on Computer Engineering and Systems, ICCES 2018, pp. 93–97, Feb. 2019, doi: 10.1109/ICCES.2018.8639198..
K. Shu, X. Zhou, S. Wang, R. Zafarani, and H. Liu, “The role of user profiles for fake news detection,” Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019, pp. 436–439, Aug. 2019, doi: 10.1145/3341161.3342927.
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383.
C. S. Atodiresei, A. Tǎnǎselea, and A. Iftene, “Identifying Fake News and Fake Users on Twitter,” Procedia Comput Sci, vol. 126, pp. 451–461, Jan. 2018, doi: 10.1016/J.PROCS.2018.07.279.
S. K. Dirjen et al., “Hoax Detection System on Twitter using Feed-Forward and Back-Propagation Neural Networks Classification Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 655–663, Aug. 2020, doi: 10.29207/RESTI.V4I4.2038.
S. Deepak and B. Chitturi, “Deep neural approach to Fake-News identification,” Procedia Comput Sci, vol. 167, pp. 2236–2243, Jan. 2020, doi: 10.1016/J.PROCS.2020.03.276.
B. Irena and E. B. Setiawan, “Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 711–716, Aug. 2020, doi: 10.29207/RESTI.V4I4.2125.
B. P. Nayoga, R. Adipradana, R. Suryadi, and D. Suhartono, “Hoax Analyzer for Indonesian News Using Deep Learning Models,” Procedia Comput Sci, vol. 179, pp. 704–712, Jan. 2021, doi: 10.1016/J.PROCS.2021.01.059.
R. K. Kaliyar, K. Fitwe, P. Rajarajeswari, and A. Goswami, “Classification of Hoax/Non-Hoax News Articles on Social Media using an Effective Deep Neural Network,” Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021, pp. 935–941, Apr. 2021, doi: 10.1109/ICCMC51019.2021.9418282.
J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 1, pp. 43–52, Jan. 2019, doi: 10.22146/IJCCS.40125.
J. A. Nasir, O. S. Khan, and I. Varlamis, “Fake news detection: A hybrid CNN-RNN based deep learning approach,” International Journal of Information Management Data Insights, vol. 1, no. 1, p. 100007, Apr. 2021, doi: 10.1016/J.JJIMEI.2020.100007.
A. Apriliyanto and R. Kusumaningrum, “HOAX DETECTION IN INDONESIA LANGUAGE USING LONG SHORT-TERM MEMORY MODEL,” SINERGI, vol. 24, no. 3, pp. 189–196, Jul. 2020, doi: 10.22441/SINERGI.2020.3.003.
L. Hulliyyatus Suadaa, I. Santoso, A. Tabitha, and B. Panjaitan, “Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 3, pp. 317–326, Jul. 2021, doi: 10.22146/IJCCS.66205.
B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” Sep. 2020, doi: 10.48550/arxiv.2009.05387.
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” pp. 757–770, Nov. 2020, doi: 10.48550/arxiv.2011.00677.
A. M. Abdelhameed, H. G. Daoud, and M. Bayoumi, “Deep Convolutional Bidirectional LSTM Recurrent Neural Network for Epileptic Seizure Detection,” 2018 16th IEEE International New Circuits and Systems Conference, NEWCAS 2018, pp. 139–143, Dec. 2018, doi: 10.1109/NEWCAS.2018.8585542.
I. R. Hidayat and W. Maharani, “General Depression Detection Analysis Using IndoBERT Method,” International Journal on Information and Communication Technology (IJoICT), vol. 8, no. 1, pp. 41–51, Aug. 2022, doi: 10.21108/IJOICT.V8I1.634.
T. A. S. Rohmah and W. Maharani, “Personality Detection on Twitter Social Media Using IndoBERT Method,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 2, pp. 448–453, Sep. 2022, doi: 10.47065/BITS.V4I2.1895.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, pp. 4171–4186, Oct. 2018, doi: 10.48550/arxiv.1810.04805.
S. Khomsah and A. S. Aribowo, “Text-Preprocessing Model Youtube Comments in Indonesian,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 4, pp. 648–654, Aug. 2020, doi: 10.29207/RESTI.V4I4.2035.
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;