Comparative Analysis of SVM, XGBoost and Neural Network on Hate Speech Classification

Keywords: Hate speech classification, machine learning, SVM, XGBoost, Neural Network

Abstract

In social media, it is found that hate speech is conveyed in the form of text, images and videos, as a result it can provoke certain people to do things that are against the law and harm other person. Therefore, it is necessary to make early detection of hate speech by utilizing machine learning algorithms. This study is to analyze the level of accuracy, precision, recall and F1-Score of 3 kinds of algorithms (SVM, XGBoost, and Neural Network) in the classification of hate speech, using datasets sourced from public hate speech on Twitter in Indonesian. The results of the analysis show that the SVM algorithm has a level of accuracy (83.2%), precision (83%), recall (83%) and F1-score (83%), SVM occupies the highest level compared to XGBoost and Neural Network, so the SVM algorithm can be considered for use in hate speech classification

Downloads

Download data is not yet available.

References

N. Y. Bakhtiar, L. O. Husen, and M. Rinaldy Bima, “Pemenuhan Hak Kebebasan Berpendapat Berdasarkan Undang-Undang Nomor 9 Tahun 1999 Tentang Kemerdekaan Berpendapat Di Muka Umum.,” J. Lex Theory, vol. 1, no. 9, pp. 41–58, 2020. https://doi.org/10.52103/jlt.v1i1.43.

M. Bilewicz and W. Soral, “Hate Speech Epidemic. The Dynamic Effects of Derogatory Language on Intergroup Relations and Political Radicalization,” Polit. Psychol., vol. 41, no. S1, pp. 3–33, 2020.

https://doi.org/10.1111/pops.12670.

M. T. Palupi, “Hoax: Pemanfaatannya Sebagai Bahan Edukasi Di Era Literasi Digital Dalam Pembentukan Karakter Generasi Muda,” J. Skripta, vol. 6, no. 1, pp. 1–12, 2020. https://doi.org/10.31316/skripta.v6i1.645.

A. Briliani, B. Irawan, and C. Setianingsih, “Hate speech detection in indonesian language on instagram comment section using K-nearest neighbor classification method,” Proc. - 2019 IEEE Int. Conf. Internet Things Intell. Syst. IoTaIS 2019, pp. 98–104, 2019.

https://doi.org/10.1109/IoTaIS47347.2019.8980398.

W. Singh, “Multilingual Speech to Text Conversion – A Review,” Adv. Math. Sci. J., vol. 9, no. 6, pp. 3963–3970, 2020. http://dx.doi.org/10.37418/amsj.9.6.77.

R. Hendrawan, Adiwijaya, and S. Al Faraby, “Multilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media,” 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020. https://doi.org/10.1109/ICoDSA50139.2020.9212962.

A. W. Pradana and M. Hayaty, “The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 375–380, 2019.

http://dx.doi.org/10.22219/kinetik.v4i4.912.

P. Meel and D. K. Vishwakarma, “Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities,” Expert Syst. Appl., vol. 153, p. 112986, 2020. https://doi.org/10.1016/j.eswa.2019.112986.

H. Wisnu, M. Afif, and Y. Ruldevyani, “Sentiment analysis on customer satisfaction of digital payment in Indonesia: A comparative study using KNN and Naïve Bayes,” J. Phys. Conf. Ser., vol. 1444, no. 1, pp. 0–10, 2020. https://doi.org/10.1088/1742-6596/1444/1/012034.

N. F. Dhina and S. Yuliant, “Sentiment Analysis on KAI Twitter Post Using Multiclass Support Vector Machine (SVM),” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 5, pp. 846–853, 2020.

https://doi.org/10.29207/resti.v4i5.2231.

F. Astiko and A. Khodar, “Membangun Model Machine Learning Untuk Meninjau Layanan Indosat Ooredoo Dari Twitter Menggunakan Naive Bayes Classifier,” J. Appl. Comput. Sci. Technol. ( JACOST ), vol. 1, no. 2, pp. 61–66, 2020. https://doi.org/10.52158/ jacost.v1i2.79.

T. T. A. Putri, S. Sriadhi, R. D. Sari, R. Rahmadani, and H. D. Hutahaean, “A comparison of classification algorithms for hate speech detection,” IOP Conf. Ser. Mater. Sci. Eng., vol. 830, no. 3, 2020.

https://doi.org/10.1088/1757-899X/830/3/032006.

S. L. Bhutia, S. Borah, R. Pradhan, and B. Sharma, “An Experiment on Parameter Selection for Landslide Susceptibility Mapping using TF-IDF,” J. Phys. Conf. Ser., vol. 1712, no. 1, 2020.

https://doi.org/10.1088/1742-6596/1712/1/012029.

N. Sharma, H. V. Bhandari, N. S. Yadav, and H. V. J. Shroff, “Optimization of IDS using Filter-Based Feature Selection and Machine Learning Algorithms,” Int. J. Innov. Technol. Explor. Eng., vol. 10, no. 2, pp. 96–102, 2020. https://doi.org/10.35940/ijitee.B8278.1210220.

S. Xiao and W. Tong, “Prediction of User Consumption Behavior Data Based on the Combined Model of TF-IDF and Logistic Regression,” J. Phys. Conf. Ser., vol. 1757, no. 1, 2021. https://doi.org/10.1088/1742-6596/1757/1/012089.

Y. Jiang, G. Tong, H. Yin, and N. Xiong, “A Pedestrian Detection Method Based on Genetic Algorithm for Optimize XGBoost Training Parameters,” IEEE Access, vol. 7, pp. 118310–118321, 2019. https://doi.org/10.1109/ACCESS.2019.2936454.

M. Guo, Z. Yuan, B. Janson, Y. Peng, Y. Yang, and W. Wang, “Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning Xgboost,” Sustain., vol. 13, no. 2, pp. 1–26, 2021.

https://doi.org/10.3390/su13020926.

Y. Li et al., “Oxide-Based Electrolyte-Gated Transistors for Spatiotemporal Information Processing,” Adv. Mater., vol. 32, no. 47, pp. 1–12, 2020. https://doi.org/10.1002/adma.202003018.

A. F. Rochim, A. Rafi, A. Fauzi, and K. T. Martono, “As-RaD System as a Design Model of the Network Automation Configuration System Based on the REST-API and Django Framework,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, pp. 291–298, 2020. https://doi.org/10.22219/kinetik.v5i4.1093.

A. A. Gamova, A. A. Horoshiy, and V. G. Ivanenko, “Detection of Fake and Provokative Comments in Social Network Using Machine Learning,” Proc. 2020 IEEE Conf. Russ. Young Res. Electr. Electron. Eng. EIConRus 2020, pp. 309–311, 2020. https://doi.org/10.1109/EIConRus49466.2020.9039423.

Published
2021-10-24
How to Cite
Liang, S. (2021). Comparative Analysis of SVM, XGBoost and Neural Network on Hate Speech Classification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(5), 896 - 903. https://doi.org/10.29207/resti.v5i5.3506
Section
Information Systems Engineering Articles