Sentiment Analysis for Detecting Cyberbullying Using TF-IDF and SVM

Wahyu Adi Prabowo; Fitriani  Azizah

doi:10.29207/resti.v4i6.2753

Wahyu Adi Prabowo Institut Teknologi Telkom Purwokerto
Fitriani Azizah Institut Teknologi Telkom Purwokerto

DOI: https://doi.org/10.29207/resti.v4i6.2753

Keywords: Preprocessing, Term Frequency and Inverse Document Frequency, Support Vector Machine, Confusion Matrix, Application, Sentiment Analysis

Abstract

Social media has become a new method of today’s communication in a new digitalize era. Children and adults have used social media a lot in interacting with others. Therefore social media has shifted conventional communication into digital one. This digital development on social media is a serious problem that must be faced because it has been found that there are more and more acts of cyberbullying. This act of cyberbullying can attack the psychic, causing depression up to suicide. The dangers of cyberbullying are troubling and cause concern to the community. Therefore, this study will analyze the sentiment on the comments contained on social media to find out the value of sentiment from comments on social media platforms. The comment data will be processed at the preprocessing stage, Term Frequency-Inverse Document Frequency (TF-IDF), and the Support Vector Machine (SVM) classification method. Comment data to be classified as 1500 data taken using crawling data through libraries in python programming and divided into 80% data training and 20% data testing. Based on the results of the test, the accuracy value is 93%, the precision value is 95%, and the recall value is 97%. In this research, a system model design is also carried out where the system can be integrated with the browser to open a user page on the classification of comments that have been input into the system.

Downloads

Download data is not yet available.

References

M. Foody, M. Samara, and P. Carlbring, “A review of cyberbullying and suggestions for online psychological therapy,” Internet Interv., vol. 2, no. 3, pp. 235–242, 2015, doi: 10.1016/j.invent.2015.05.002.

I. Kwan et al., “Cyberbullying and Children and Young People’s Mental Health: A Systematic Map of Systematic Reviews,” Cyberpsychology, Behav. Soc. Netw., vol. 23, no. 2, pp. 72–82, 2020, doi: 10.1089/cyber.2019.0370.

C. K. A. Mawardah, R. Normala, C. Azlini, M. Y. Kamal, and Z. M. Lukman, “The Factors of Cyber Bullying and the Effects on Cyber Victims,” Int. J. Res. Innov. Soc. Sci., vol. 2, no. XII, pp. 59–61, 2018.

E. Tartari, “Benefits and Risks of Children and Adolescents Using Social Media,” Eur. Sci. J., vol. 11, no. 13, pp. 321–332, 2015.

D. L. Espelage and J. S. Hong, “Cyberbullying Prevention and Intervention Efforts: Current Knowledge and Future Directions,” Can. J. Psychiatry, vol. 62, no. 6, pp. 374–380, 2017, doi: 10.1177/0706743716684793.

R. Garett, L. R. Lord, and S. D. Young, “Associations between social media and cyberbullying: a review of the literature,” mHealth, vol. 2, pp. 46–46, 2016, doi: 10.21037/mhealth.2016.12.01.

X. W. Chu, C. Y. Fan, Q. Q. Liu, and Z. K. Zhou, “Cyberbullying victimization and symptoms of depression and anxiety among Chinese adolescents: Examining hopelessness as a mediator and self-compassion as a moderator,” Comput. Human Behav., vol. 86, pp. 377–386, 2018, doi: 10.1016/j.chb.2018.04.039.

G. M. Abaido, “Cyberbullying on social media platforms among university students in the United Arab Emirates,” Int. J. Adolesc. Youth, vol. 25, no. 1, pp. 407–420, 2020, doi: 10.1080/02673843.2019.1669059.

E. Byrne, J. A. Vessey, and L. Pfeifer, “Cyberbullying and Social Media: Information and Interventions for School Nurses Working With Victims, Students, and Families,” Journal of School Nursing, vol. 34, no. 1. pp. 38–50, 2018, doi: 10.1177/1059840517740191.

Hariani and I. Riadi, “Detection of Cyberbullying on Social Media Using Data Mining Techniques,” Int. J. Comput. Sci. Inf. Secur., vol. 15, no. 3, pp. 244–250, 2017.

T. Pradheep, J. . Sheeba, T. Yogeshwaran, and S. Pradeep Devaneyan, “Automatic Multi Model Cyber Bullying Detection from Social Networks,” SSRN Electron. J., 2018, doi: 10.2139/ssrn.3123710.

R. Hernández Petlachi and X. Li, “Análisis de sentimiento sobre textos en Español basado en aproximaciones semánticas con reglas linguísticas,” in TASS 2014, 2014.

D. Farid and N. El-Tazi, “Detection of Cyberbullying in Tweets in Egyptian Dialects,” vol. 18, no. 7, pp. 34–41, 2020, [Online]. Available: https://sites.google.com/site/ijcsis/.

B. Y. AlHarbi, M. S. AlHarbi, N. J. AlZahrani, M. M. Alsheail, J. F. Alshobaili, and D. M. Ibrahim, “Automatic cyber bullying detection in Arabic social media,” Int. J. Eng. Res. Technol., vol. 12, no. 12, pp. 2330–2335, 2019.

M. Ahmad, S. Aftab, and I. Ali, “Sentiment Analysis of Tweets using SVM,” Int. J. Comput. Appl., vol. 177, no. 5, pp. 25–29, 2017, doi: 10.5120/ijca2017915758.

P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,” in International Conference on Electrical, Electronics, and Optimization Techniques, ICEEOT 2016, 2016, pp. 61–66, doi: 10.1109/ICEEOT.2016.7754750.

M. J. Lavin, “Analyzing Documents with TF-IDF,” Program. Hist., no. 8, 2019, doi: 10.46430/phen0082.

H. N. Irmanda and Ria Astriratma, “Klasifikasi Jenis Pantun Dengan Metode Support Vector Machines (SVM),” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 5, pp. 915–922, 2020, doi: 10.29207/resti.v4i5.2313.

V. Vapnik and R. Izmailov, “Knowledge transfer in SVM and neural networks,” Ann. Math. Artif. Intell., vol. 81, no. 1–2, pp. 3–19, 2017, doi: 10.1007/s10472-017-9538-x.

S. Chidambaram and K. G. Srinivasagan, “Performance evaluation of support vector machine classification approaches in data mining,” Cluster Comput., vol. 22, pp. 189–196, 2019, doi: 10.1007/s10586-018-2036-z.

F. Resnik, A. Bellmore, J. Zhu, and W. Zhang, “Using Machine Learning to Understand Changes in How Youth Discuss Bullying With Celebrities on Social Media,” pp. 1–1, 2018, doi: 10.1145/3183654.3183694.

D. Zein and W. Wagiati, “BAHASA GAUL KAUM MUDA SEBAGAI KREATIVITAS LINGUISTIS PENUTURNYA PADA MEDIA SOSIAL DI ERA TEKNOLOGI KOMUNIKASI DAN INFORMASI,” J. Sosioteknologi, vol. 17, no. 2, p. 236, 2018, doi: 10.5614/sostek.itbj.2018.17.2.6.

A. I. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 6, pp. 22–32, 2018.

P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,” Int. Conf. Electr. Electron. Optim. Tech. ICEEOT 2016, no. November, pp. 61–66, 2016, doi: 10.1109/ICEEOT.2016.7754750.

S. Kannan et al., “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

A. I. Kadhim, Y. N. Cheah, and N. H. Ahamed, “Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering,” in Proceedings - 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, ICAIET 2014, 2015, pp. 69–73, doi: 10.1109/ICAIET.2014.21.

S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents,” Int. J. Comput. Appl., vol. 181, no. 1, pp. 25–29, 2018, doi: 10.5120/ijca2018917395.

M. Awad, R. Khanna, M. Awad, and R. Khanna, “Support Vector Machines for Classification,” in Efficient Learning Machines, 2015, pp. 39–66.

Mohan Patro and M. Ranjan Patra, “A Novel Approach to Compute Confusion Matrix for Classification of n-Class Attributes with Feature Selection,” Trans. Mach. Learn. Artif. Intell., 2015, doi: 10.14738/tmlai.32.1108.

Jiawei Han and M. Kamber, Data Mining: Concepts and Techniques Second Edition, vol. 53, no. 9. 2013.