Advancing Hate Speech Detection in Indonesian Language  Using Graph Neural Networks and TF-IDF

Syaikha Amirah Zikrina; Fitriyani

doi:10.29207/resti.v9i1.6179

Syaikha Amirah Zikrina Telkom University
Fitriyani Telkom University

DOI: https://doi.org/10.29207/resti.v9i1.6179

Keywords: Context-Aware Sentiment Analysis, Hate Speech Detection, Graph Neural Network (GNN), Social Media X, TF-IDF

Abstract

Most of the hate speech and abusive content on social media, particularly in the Indonesian language, presents significant challenges for content moderation systems. Previous research has applied machine learning models such as Recurrent Neural Networks (RNN), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) to address this issue. However, these approaches are limited in their ability to capture the relational and contextual nuances inherent in the data, resulting in suboptimal performance. This study introduces an approach by combining Graph Neural Networks (GNN) with Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction to improve hate speech detection on Twitter (platform X). The dataset consists of 13,169 Indonesian tweets, manually labeled for hate speech and abusive categories. Preprocessing steps include text cleaning, stemming, stop-word removal, and normalization. The GNN model achieved superior results, with accuracy scores of 92.90% for Abusive and 89.78% for Hate Speech, significantly outperforming the RNN model, which achieved accuracy of 86.09% and 86.15%, respectively. This study highlights the advantage of graph-based approaches in capturing complex relationships within text data. Future research can explore expanding datasets to include regional dialects and integrating advanced feature extraction techniques like Word2Vec or BERT. This study establishes a robust framework for improving hate speech detection, offering a valuable contribution to safer digital environments.

Downloads

Download data is not yet available.

References

D. Murthy, “Sociology of Twitter/X: Trends, Challenges, and Future Research Directions,” Annu Rev Sociol, vol. 50, no. 1, pp. 169–190, Aug. 2024, doi: 10.1146/annurev-soc-031021-035658.

K. A. Rosyida and M. B. Siroj, “Strategi, Jenis Tindak Tutur dan Pola Tutur Pencemaran Nama Baik di Media Sosial,” Jurnal Sastra Indonesia, vol. 10, no. 2, pp. 127–132, Jul. 2021, doi: 10.15294/jsi.v10i2.46672.

I. Riadi, A. Fadlil, and U. Ahmad Dahlan Yogyakarta, “Identifying Hate Speech in Tweets with Sentiment Analysis on Indonesian Twitter Utilizing Support Vector Machine Algorithm,” 2023.

A. Matamoros-Fernández and J. Farkas, “Racism, Hate Speech, and Social Media: A Systematic Review and Critique,” Television & New Media, vol. 22, no. 2, pp. 205–224, Feb. 2021, doi: 10.1177/1527476420982230.

I. Alfina, R. Mulia, M. I. Fanany, and Y. Ekanata, “Hate speech detection in the Indonesian language: A dataset and preliminary study,” in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), IEEE, Oct. 2017, pp. 233–238. doi: 10.1109/ICACSIS.2017.8355039.

M. O. Ibrohim and I. Budi, “Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter,” in Proceedings of the Third Workshop on Abusive Language Online, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 46–57. doi: 10.18653/v1/W19-3506.

A. P. J. Dwitama, “DETEKSI UJARAN KEBENCIAN PADA TWITTER BAHASA INDONESIA MENGGUNAKAN MACHINE LEARNING: REVIU LITERATUR,” Jurnal Sains, Nalar, dan Aplikasi Teknologi Informasi, vol. 1, no. 1, Aug. 2021, doi: 10.20885/snati.v1i1.5.

S. Joshi, R. Dubey, A. Tiwari, and P. Jindal, “Sentiment Analysis Algorithms: Classifiers and Their Comparison,” 2021, pp. 201–210. doi: 10.1007/978-981-16-1295-4_21.

P. Liu, S. Joty, and H. Meng, “Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for Computational Linguistics, 2015, pp. 1433–1443. doi: 10.18653/v1/D15-1168.

W. Wang, S. J. Pan, D. Dahlmeier, and X. Xiao, “Coupled Multi-Layer Attentions for Co-Extraction of Aspect and Opinion Terms,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, Feb. 2017, doi: 10.1609/aaai.v31i1.10974.

Imamah and F. H. Rachman, “Twitter Sentiment Analysis of Covid-19 Using Term Weighting TF-IDF And Logistic Regresion,” in 2020 6th Information Technology International Seminar (ITIS), IEEE, Oct. 2020, pp. 238–242. doi: 10.1109/ITIS50118.2020.9320958.

Merinda Lestandy, Abdurrahim Abdurrahim, and Lailis Syafa’ah, “Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 4, pp. 802–808, Aug. 2021, doi: 10.29207/resti.v5i4.3308.

R. Kosasih and A. Alberto, “Analisis Sentimen Produk Permainan Menggunakan Metode TF-IDF Dan Algoritma K-Nearest Neighbor,” vol. 6, no. 1, 2021, doi: 10.30743/infotekjar.v6i1.3893.

X. Ma et al., “A Comprehensive Survey on Graph Anomaly Detection With Deep Learning,” IEEE Trans Knowl Data Eng, vol. 35, no. 12, pp. 12012–12038, Dec. 2023, doi: 10.1109/TKDE.2021.3118815.

Muhammad Rizki Nurfiqri and Fitriyani, “The Performance Analysis of Graph Neural Network (GNN) and Convolutional Neural Network (CNN) Algorithms for Cyberbullying Detection in Twitter Comments,” Indonesian Journal of Computer Science, vol. 13, no. 3, Jun. 2024, doi: 10.33022/ijcs.v13i3.3940.

“Penerapan graph neural network dalam pembangungan sistem rekomendasi.”

L. Ardiani, H. Sujaini, and T. Tursina, “Implementasi Sentiment Analysis Tanggapan Masyarakat Terhadap Pembangunan di Kota Pontianak,” Jurnal Sistem dan Teknologi Informasi (Justin), vol. 8, no. 2, p. 183, Apr. 2020, doi: 10.26418/justin.v8i2.36776.

B. Zhang, Q. He, and D. Zhang, “Heterogeneous Graph Neural Network for Short Text Classification,” Applied Sciences, vol. 12, no. 17, p. 8711, Aug. 2022, doi: 10.3390/app12178711.

Q. A. Xu, V. Chang, and C. Jayne, “A systematic review of social media-based sentiment analysis: Emerging trends and challenges,” Decision Analytics Journal, vol. 3, p. 100073, Jun. 2022, doi: 10.1016/j.dajour.2022.100073.

H. Utami, “Analisis Sentimen dari Aplikasi Shopee Indonesia Menggunakan Metode Recurrent Neural Network,” Indonesian Journal of Applied Statistics, vol. 5, no. 1, p. 31, May 2022, doi: 10.13057/ijas.v5i1.56825.

R. DiPitero and D. Hager, “Medical Image Computing and Computer Assisted Internation,” 2014, pp. 503–514.

A. S. Saksesi, “HS RNN,” 2018.

D. Cahyanti, A. Rahmayani, and S. A. Husniar, “Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara,” Indonesian Journal of Data and Science, vol. 1, no. 2, pp. 39–43, Jul. 2020, doi: 10.33096/ijodas.v1i2.13.

K. P. Santoso, R. Venantius, H. Ginardi, R. A. Sastrowardoyo, and F. A. Madany, “Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks.”

S. D. A. Putri, M. O. Ibrohim, and I. Budi, “Abusive Language and Hate Speech Detection for Indonesian-Local Language in Social Media Text,” 2021, pp. 88–98. doi: 10.1007/978-3-030-79757-7_9.

wesam ahmed, N. Semary, K. Amin, and M. Adel Hammad, “Sentiment Analysis on Twitter Using Machine Learning Techniques and TF-IDF Feature Extraction: A Comparative Study,” IJCI. International Journal of Computers and Information, vol. 10, no. 3, pp. 52–57, Nov. 2023, doi: 10.21608/ijci.2023.236052.1128.

H. Phan and A. Jannesari, “Story point level classification by text level graph neural network,” in Proceedings of the 1st International Workshop on Natural Language-based Software Engineering, New York, NY, USA: ACM, May 2022, pp. 75–78. doi: 10.1145/3528588.3528654.

L. Yao, C. Mao, and Y. Luo, “Graph Convolutional Networks for Text Classification,” Sep. 2018, [Online]. Available: http://arxiv.org/abs/1809.05679

L. Waikhom and R. Patgiri, “Graph Neural Networks: Methods, Applications, and Opportunities,” Aug. 2021, [Online]. Available: http://arxiv.org/abs/2108.10733

F. B. Mahmud, M. Md. S. Rayhan, M. H. Shuvo, I. Sadia, and Md. K. Morol, “A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection,” in 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), IEEE, Mar. 2022, pp. 97–102. doi: 10.1109/CDMA54072.2022.00021.

J. Lu, J. Yang, D. Batra, and D. Parikh, “Hierarchical Question-Image Co-Attention for Visual Question Answering,” May 2016, [Online]. Available: http://arxiv.org/abs/1606.00061

W. Fan et al., “Graph neural networks for social recommendation,” in The Web Conference 2019 - Proceedings of the World Wide Web Conference, WWW 2019, Association for Computing Machinery, Inc, May 2019, pp. 417–426. doi: 10.1145/3308558.3313488.