Comparative Analysis of SVM, XGBoost and Neural Network on Hate Speech Classification
Abstract
In social media, it is found that hate speech is conveyed in the form of text, images and videos, as a result it can provoke certain people to do things that are against the law and harm other person. Therefore, it is necessary to make early detection of hate speech by utilizing machine learning algorithms. This study is to analyze the level of accuracy, precision, recall and F1-Score of 3 kinds of algorithms (SVM, XGBoost, and Neural Network) in the classification of hate speech, using datasets sourced from public hate speech on Twitter in Indonesian. The results of the analysis show that the SVM algorithm has a level of accuracy (83.2%), precision (83%), recall (83%) and F1-score (83%), SVM occupies the highest level compared to XGBoost and Neural Network, so the SVM algorithm can be considered for use in hate speech classification
Downloads
References
N. Y. Bakhtiar, L. O. Husen, and M. Rinaldy Bima, “Pemenuhan Hak Kebebasan Berpendapat Berdasarkan Undang-Undang Nomor 9 Tahun 1999 Tentang Kemerdekaan Berpendapat Di Muka Umum.,” J. Lex Theory, vol. 1, no. 9, pp. 41–58, 2020. https://doi.org/10.52103/jlt.v1i1.43.
M. Bilewicz and W. Soral, “Hate Speech Epidemic. The Dynamic Effects of Derogatory Language on Intergroup Relations and Political Radicalization,” Polit. Psychol., vol. 41, no. S1, pp. 3–33, 2020.
https://doi.org/10.1111/pops.12670.
M. T. Palupi, “Hoax: Pemanfaatannya Sebagai Bahan Edukasi Di Era Literasi Digital Dalam Pembentukan Karakter Generasi Muda,” J. Skripta, vol. 6, no. 1, pp. 1–12, 2020. https://doi.org/10.31316/skripta.v6i1.645.
A. Briliani, B. Irawan, and C. Setianingsih, “Hate speech detection in indonesian language on instagram comment section using K-nearest neighbor classification method,” Proc. - 2019 IEEE Int. Conf. Internet Things Intell. Syst. IoTaIS 2019, pp. 98–104, 2019.
https://doi.org/10.1109/IoTaIS47347.2019.8980398.
W. Singh, “Multilingual Speech to Text Conversion – A Review,” Adv. Math. Sci. J., vol. 9, no. 6, pp. 3963–3970, 2020. http://dx.doi.org/10.37418/amsj.9.6.77.
R. Hendrawan, Adiwijaya, and S. Al Faraby, “Multilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020. https://doi.org/10.1109/ICoDSA50139.2020.9212962.
A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 375–380, 2019.
http://dx.doi.org/10.22219/kinetik.v4i4.912.
P. Meel and D. K. Vishwakarma, “Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities,” Expert Syst. Appl., vol. 153, p. 112986, 2020. https://doi.org/10.1016/j.eswa.2019.112986.
H. Wisnu, M. Afif, and Y. Ruldevyani, “Sentiment analysis on customer satisfaction of digital payment in Indonesia: A comparative study using KNN and Naïve Bayes,” J. Phys. Conf. Ser., vol. 1444, no. 1, pp. 0–10, 2020. https://doi.org/10.1088/1742-6596/1444/1/012034.
N. F. Dhina and S. Yuliant, “Sentiment Analysis on KAI Twitter Post Using Multiclass Support Vector Machine (SVM),” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 5, pp. 846–853, 2020.
https://doi.org/10.29207/resti.v4i5.2231.
F. Astiko and A. Khodar, “Membangun Model Machine Learning Untuk Meninjau Layanan Indosat Ooredoo Dari Twitter Menggunakan Naive Bayes Classifier,” J. Appl. Comput. Sci. Technol. ( JACOST ), vol. 1, no. 2, pp. 61–66, 2020. https://doi.org/10.52158/ jacost.v1i2.79.
T. T. A. Putri, S. Sriadhi, R. D. Sari, R. Rahmadani, and H. D. Hutahaean, “A comparison of classification algorithms for hate speech detection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 830, no. 3, 2020.
https://doi.org/10.1088/1757-899X/830/3/032006.
S. L. Bhutia, S. Borah, R. Pradhan, and B. Sharma, “An Experiment on Parameter Selection for Landslide Susceptibility Mapping using TF-IDF,” J. Phys. Conf. Ser., vol. 1712, no. 1, 2020.
https://doi.org/10.1088/1742-6596/1712/1/012029.
N. Sharma, H. V. Bhandari, N. S. Yadav, and H. V. J. Shroff, “Optimization of IDS using Filter-Based Feature Selection and Machine Learning Algorithms,” Int. J. Innov. Technol. Explor. Eng., vol. 10, no. 2, pp. 96–102, 2020. https://doi.org/10.35940/ijitee.B8278.1210220.
S. Xiao and W. Tong, “Prediction of User Consumption Behavior Data Based on the Combined Model of TF-IDF and Logistic Regression,” J. Phys. Conf. Ser., vol. 1757, no. 1, 2021. https://doi.org/10.1088/1742-6596/1757/1/012089.
Y. Jiang, G. Tong, H. Yin, and N. Xiong, “A Pedestrian Detection Method Based on Genetic Algorithm for Optimize XGBoost Training Parameters,” IEEE Access, vol. 7, pp. 118310–118321, 2019. https://doi.org/10.1109/ACCESS.2019.2936454.
M. Guo, Z. Yuan, B. Janson, Y. Peng, Y. Yang, and W. Wang, “Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning Xgboost,” Sustain., vol. 13, no. 2, pp. 1–26, 2021.
https://doi.org/10.3390/su13020926.
Y. Li et al., “Oxide-Based Electrolyte-Gated Transistors for Spatiotemporal Information Processing,” Adv. Mater., vol. 32, no. 47, pp. 1–12, 2020. https://doi.org/10.1002/adma.202003018.
A. F. Rochim, A. Rafi, A. Fauzi, and K. T. Martono, “As-RaD System as a Design Model of the Network Automation Configuration System Based on the REST-API and Django Framework,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, pp. 291–298, 2020. https://doi.org/10.22219/kinetik.v5i4.1093.
A. A. Gamova, A. A. Horoshiy, and V. G. Ivanenko, “Detection of Fake and Provokative Comments in Social Network Using Machine Learning,” Proc. 2020 IEEE Conf. Russ. Young Res. Electr. Electron. Eng. EIConRus 2020, pp. 309–311, 2020. https://doi.org/10.1109/EIConRus49466.2020.9039423.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;