Implementation of Rumor Detection on Twitter Using the SVM Classification Method

Annisa Rahmaniar Dwi Pratiwi; Erwin Budi Setiawan

doi:10.29207/resti.v4i5.2031

Annisa Rahmaniar Dwi Pratiwi Telkom University
Erwin Budi Setiawan

DOI: https://doi.org/10.29207/resti.v4i5.2031

Keywords: SVM, TF-IDF, Crawling, rumor, Twitter.

Abstract

Twitter is one of the popular social network sites, that was first launched in 2006. This service allows users to spread real-time information. However, the information obtained is not always based on facts and sometimes deliberately used to spread rumors that cause fear to the public. So detection efforts are needed to overcome and prevent the spread of rumors on Twitter. Much research regarding the detection of rumors but is limited to English and Chinese. In this study, the authors built a system to detect Indonesian-language rumors based on the implementation of the SVM classification and feature selection using the TF-IDF weighting. Data collection was conducted in November 2019 to February 2020 using crawling methods by keywords and manual labeling process. Research data used topics around government and trending with 47,449 records and features combination based on users and tweets. Stages of research include the process of collecting data on the Twitter social networking site which is then carried out preprocessing consists of case-folding, URL removal, normalization, stopwords removal, and stemming. The next stage is feature selection, N-Gram modeling, classification, and evaluation using a confusion matrix. Based on the results of the study, the system gets good performance in the test scenario using 10% of testing data and unigram features with the highest accuracy value of 78.71%. As for features twitter that affected the detection of rumors covering the number of following, the number of like and mention.

Downloads

Download data is not yet available.

References

datareportal.com, “Digital 2019: Indonesia — DataReportal – Global Digital Insights,” 2019. [Online]. Available: https://datareportal.com/reports/digital-2019-indonesia. [Accessed: 09-Oct-2019].

kominfo.go.id, “Warganet Paling Banyak Laporkan Akun Twitter,” 2019. [Online]. Available: https://kominfo.go.id/content/detail/15852/siaran-pers-no-08hmkominfo012019-tentang-warganet-paling-banyak-laporkan-akun-twitter/0/siaran_pers. [Accessed: 09-Oct-2019].

A. Kumar and S. R. Sangwan, “Rumor Detection Using Machine Learning Techniques on Social Media,” Int. Conf. Innov. Comput. Commun. Lect. Notes Networks Syst., vol. 56, pp. 443–451, 2019.

A. Bondielli and F. Marcelloni, “A survey on fake news and rumour detection techniques,” Inf. Sci. (Ny)., vol. 497, pp. 38–55, 2019.

J. Li, H. Ji, D. Zhao, and Y. Feng, “Automatic Detection of Rumor on Social Network,” Nat. Lang. Process. Chinese Comput. NLPCC 2015. Lect. Notes Comput. Sci., vol. 9362, pp. 113–122, 2015.

F. Yang, Y. Liu, X. Yu, and M. Yang, “Automatic Detection of Rumor on Sina Weibo Categories and Subject Descriptors,” Proc. ACM SIGKDD Work. Min. Data Semant., vol. 2, pp. 13:1--13:7, 2012.

G. Liang, W. He, C. Xu, L. Chen, and J. Zeng, “Rumor Identification in Microblogging Systems Based on Users’ Behavior,” IEEE Trans. Comput. Soc. Syst., vol. 2, no. 3, pp. 99–108, 2015.

Suyanto, Machine Learning: Tingkat Dasar dan Lanjut. Informatika Bandung, 2018.

B. Trstenjak, S. Mikac, and D. Donko, “KNN with TF-IDF based framework for text categorization,” Procedia Engineering, vol. 69. pp. 1356–1364, 2014.

J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” no. September, pp. 11–16, 2016.

G. Giannakopoulos, V. Karkaletsis, G. Vouros, and P. Stamatopoulos, “Summarization system evaluation revisited: N-gram graphs,” ACM Trans. Speech Lang. Process., vol. 5, no. 3, pp. 1–39, 2008.

S. García, Intelligent Systems Reference Library 72 Data Preprocessing in Data Mining. 2015.

A. Rahman, “Online News Classification Using Multinomial Naive Bayes,” Itsmart, vol. 6, no. 1, pp. 32–38, 2017.

S. Vijayarani, M. J. Ilamathi, M. Nithya, A. Professor, and M. P. Research Scholar, “Preprocessing Techniques for Text Mining -An Overview,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.

M. N. Saadah, R. W. Atmagi, D. S. Rahayu, and A. Z. Arifin, “Information Retrieval Of Text Document With Weighting TF-IDF And LCS,” J. Ilmu Komput. dan Inf. (Journal Comput. Sci. Information), vol. Vol 6, No, pp. 34–37, 2013.

T. B. Adji, Z. Abidin, and H. A. Nugroho, “System of negative Indonesian website detection using TF-IDF and Vector Space Model,” Proc. 2014 Int. Conf. Electr. Eng. Comput. Sci. ICEECS 2014, no. February, pp. 174–178, 2014.

D. Nugroho, A.S., Witarto, A.B., Handoko, “Application of Support Vector Machine in Bioinformatics,” Proceeding Indones. Sci. Meet. Cent. Japan, December 20, 2003, Gifu-Japan, pp. 842–847, 2011.

S. Agarwal, Data mining: Data mining concepts and techniques. 2014.

A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, 2018.

Implementation of Rumor Detection on Twitter Using the SVM Classification Method

Abstract

Downloads

References

Most read articles by the same author(s)