Covid-19 Fake News Detection on Twitter Based on Author Credibility Using Information Gain and KNN MethodsCovid-19 Fake News Detection on Twitter Based on Author Credibility Using Information Gain and KNN Methods
Abstract
Twitter is one of the social media that is used as a tool to share various kinds of information about various kinds of things that are of concern to social media users. One of the information shared is information about COVID-19, which is known that the COVID-19 pandemic is currently spreading throughout the world at a very alarming rate. COVID-19 is an infectious disease caused by SARS-COV-2. The World Health Organization (WHO) claims that the spread of COVID-19 is supported by the spread of false/fake news. So to find out the truth of the news, a COVID-19 fake news detector is needed so that users don't fall for the hoaxes circulating. This study aims to classify COVID-19 news on Twitter based on author credibility. Credibility in question is a person's perception of the validity of information and is a multidimensional concept that is used as a means of receiving information to assess the source of communication. The method used in this research is Information Gain and KNN. KNN (K-Nearest Neighbor) is a supervised learning algorithm that works by classifying a set of data based on classified training data. Information Gain is used to ranking the most influential attributes, and KNN is used to classify data based on learning data taken from the nearest neighbors. The research consists of 6 main stages, namely data collection (crawling data), data preprocessing, feature extraction, feature selection, data split into training data and testing data, KNN stage, and data evaluation stage. The research carried out succeeded in obtaining an accuracy value of 91%, a correlation value between credibility and hoax of 0.115, and a p-value <0.005.
Downloads
References
K. K. Kapoor, K. Tamilmani, N. P. Rana, P. Patil, Y. K. Dwivedi, and S. Nerur, “Advances in Social Media Research: Past, Present, and Future,” Inf. Syst. Front., vol. 20, no. 3, pp. 531–558, Jun. 2018, doi: 10.1007/s10796-017-9810-y.
N. S. Mudawamah, “Internet User Behavior: Case Study of Library and Information Science Department Students of UIN Maulana Malik Ibrahim”.
S. Alhabash and M. Ma, “A Tale of Four Platforms: Motivations and Uses of Facebook, Twitter, Instagram, and Snapchat Among College Students?,” Soc. Media Soc., vol. 3, no. 1, p. 205630511769154, Jan. 2017, doi: 10.1177/2056305117691544.
G. K. Shahi, A. Dirkson, and T. A. Majchrzak, “An Exploratory Study of Covid-19 Misinformation on Twitter,” Online Soc. Netw. Media, vol. 22, p. 100104, Mar. 2021, doi: 10.1016/j.osnem.2020.100104.
L. Garrett, “Covid-19: the Medium is the Message,” The Lancet, vol. 395, no. 10228, pp. 942–943, Mar. 2020, doi: 10.1016/S0140-6736(20)30600-0.
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383.
J. Zarocostas, “How to Fight an Infodemic,” The Lancet, vol. 395, no. 10225, p. 676, Feb. 2020, doi: 10.1016/S0140-6736(20)30461-X.
J. Zhang, B. Dong, and P. S. Yu, “Fake Detector: Effective Fake News Detection with Deep Diffusive Neural Network.” arXiv, Aug. 10, 2019. Accessed: Jan. 14, 2023. [Online]. Available: http://arxiv.org/abs/1805.08751
T. R. A. Pangaribuan, “Social Media Credibility in Reporting the Jakarta Governor Election,” vol. 18, no. 2.
R. I. Pristiyanti, M. A. Fauzi, and L. Muflikhah, “Sentiment Analysis Summarizing Film Reviews Using Information Gain and K-Nearest Neighbor Methods”.
S. H. Sahir, R. S. Ayu Ramadhana, M. F. Romadhon Marpaung, S. R. Munthe, and R. Watrianthos, “Online learning sentiment analysis during the covid-19 Indonesia pandemic using twitter data,” IOP Conf Ser Mater Sci Eng, vol. 1156, no. 1, p. 012011, 2021, doi: 10.1088/1757-899x/1156/1/012011.
I. Maulida, A. Suyatno, and H. R. Hatta, “Feature Selection in Abstract Indonesian Text Documents Using the Information Gain Method,” J. SIFO Mikroskil, vol. 17, no. 2, pp. 249–258, Oct. 2016, doi: 10.55601/jsm.v17i2.379.
R. K. Dinata, H. Novriando, N. Hasdyna, and S. Retno, “Attribute Reduction Using Information Gain for Cluster Optimization of the K-Means Algorithm,” J. Edukasi Dan Penelit. Inform. JEPIN, vol. 6, no. 1, p. 48, Apr. 2020, doi: 10.26418/jp.v6i1.37606.
Samsir et al., “Naives Bayes Algorithm for Twitter Sentiment Analysis,” J Phys Conf Ser, vol. 1933, no. 1, p. 012019, 2021, doi: 10.1088/1742-6596/1933/1/012019.
R. I. Perwira, B. Yuwono, R. I. P. Siswoyo, F. Liantoni, and H. Himawan, “Effect of Information Gain on Document Classification Using K-Nearest Neighbor,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 8, no. 1, p. 50, Jan. 2022, doi: 10.26594/register.v8i1.2397.
S. Tang, S. Yuan, and Y. Zhu, “Data Preprocessing Techniques in Convolutional Neural Network Based on Fault Diagnosis Towards Rotating Machinery,” IEEE Access, vol. 8, pp. 149487–149496, 2020, doi: 10.1109/ACCESS.2020.3012182.
R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The Impact of Features Extraction on the Sentiment Analysis,” Procedia Comput. Sci., vol. 152, pp. 341–348, 2019, doi: 10.1016/j.procs.2019.05.008.
O. Caelen, “A Bayesian Interpretation of the Confusion Matrix,” Ann. Math. Artif. Intell., vol. 81, no. 3–4, pp. 429–450, Dec. 2017, doi: 10.1007/s10472-017-9564-8.
Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMAJURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293.
Isman, Andani Ahmad, and Abdul Latief, “Comparison of KNN and LBPH Methods in Classification of Herbal Leaves,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 5, no. 3, pp. 557–564, Jun. 2021, doi: 10.29207/resti.v5i3.3006.
G. B. Firmanesha, S. S. Prasetyowati, and Y. Sibaroni, “Detecting Hoax News Regarding the Covid-19 Vaccine Using Levenshtein Distance”.
A. Essra, “Analysis of Information Gain Attribute Evaluation for Classification of Intrusion Attacks,” 2016.
M. I. A. Ismandiya and Y. Sibaroni, “Indonesian News Classification Using Weighted K-Nearest Neighbour”.
I. J. A. Cici Apriza Yanti, “Pearson, Spearman, and Kendall Tau Correlation Test Differences in Analyzing the Incidence of Diarrhea,” J. Endur., vol. 6, no. 1, pp. 51–58, Jun. 2022, doi: 10.22216/jen.v6i1.137.
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;