Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method

Identifikasi Berita Palsu (Hoax) pada Media Sosial Twitter dengan Metode Decision Tree C4.5

  • Brenda Irena Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: hoax, twitter, decision tree C4.5, TF-IDF, information gain

Abstract

Social media is a means to communicate and exchange information between people, and Twitter is one of them. But the information disseminated is not entirely true, but there is some news that is not in accordance with the truth or often called hoaxes. There have been many cases of spreading hoaxes that cause concern and often harm a particular individual or group. So in this research, the authors build a system to identify hoax news on social media Twitter using the Decision Tree C4.5 classification method to the 50,610 tweet data. What distinguishes this research from some researches before is the existence of several test scenarios, classification only, classification using weighting feature, and also classification using weighting feature and feature selection. The weighting method used is TF-IDF, and the feature selection uses Information Gain. The features used are also generated using n-grams consisting of unigram, bigram, and also trigrams. The final results show that the classification test that uses weighting feature and feature selection produces the best accuracy of 72.91% with a ratio of 90% training data and 10% test data (90:10) and the number of features used is 5000 in unigram features.

 

Downloads

Download data is not yet available.

References

Data Reportal, 2020. Digital 2020: Indonesia. [Online] (Updated 18 Feb 2020).

Available at: https://datareportal.com/reports/digital-2020-indonesia [Accessed 2 June 2020]

Mastel, 2019. Hasil Survey Wabah HOAX Nasional 2019. [Online] (Updated 10 Apr 2019)

Available at: https://mastel.id/hasil-survey-wabah-hoax-nasional-2019. [Accessed 20 September 2019]

Fauzi, A., Setiawan, E.B., Baizal, Z.K.A., 2018. Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method. The 2nd International Conference on Data Information Science. Bandung, Indonesia 15-16 Nov 2018. IOP Publishing.

Granik, M., Mesyura, V., 2017. Fake News Detection using Naïve Bayes Classifier. IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON). Kiev, Ukraine, 29 May-2 Jun 2017, IEEE.

Vuković M., Pripužić K., Belani H., 2009. An Intelligent Automatic Hoax Detection System. In: Velásquez J.D., Ríos S.A., Howlett R.J., Jain L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2009. Lecture Notes in Computer Science, vol 5711. Springer, Berlin, Heidelberg

Quinlan, J.R., 1992. C4.5 Programs for Machine Learning. 1st ed. London: Morgan Kaufmann Publishers.

Siregar, Z.U., Siregar, R.R., Arianto, R., 2019. Jurnal Kilat. Klasifikasi Sentiment Analysis pada Komentar Peserta Diklat menggunakan Metode K-Nearest Neighbor, 8 (1), pp.81-92.

Chandra, D.N., Indrawan, G., Sukajaya, I.N., 2016. Jurnal Ilmiah Teknologi dan Informasi ASIA. Klasifikasi Berita Lokal Radar Malang Menggunakan Metode Naïve Bayes Dengan FItur N-gram. 10 (1), pp.11-19.

Chelliah, C.D., Gowri, M., Subramanian, B., S. M. A. Kalaiarasi, and Ramaraj, N., 2010. Journal of Engineering Science and Technology. A novel term weighting scheme MIDF for text categorization. 5 (1), pp.94-107.

Rasywir, E., Purwarianti, A., 2015. Jurnal Cybermatika. Eksperimen pada Sistem Klasifikasi Berita Hoax Berbahasa Indonesia Berbasis Pembelajaran Mesin, 3(2), pp.1-8.

Esposito, F., Malerba, D., Semeraro, G., Kay, J., 1997. A Comparative Analysis of Methods for Pruning Decision Tress, 19(5), pp.476-491.

Sunjana, 2010. Jurnal Fakultas Hukum UII. Aplikasi Mining Data Mahasiswa Dengan Metode Klasifikasi, pp.24-29.

Suyanto, 2007. Artificial Intelligence. Bandung: Penerbit Informatika.

Han, J., Kamber, M., Pei, J., 2012. Data Mining Concepts and Techniques. 3rd ed, San Fransisco: Morgan Kauffman Publishers.

Guilet, Fabrice, Hamilton, Howard J, 2007. Quality Measures in Data Mining. Springer.

Published
2020-08-17
How to Cite
Irena, B., & Erwin Budi Setiawan. (2020). Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(4), 711 - 716. https://doi.org/10.29207/resti.v4i4.2125
Section
Information Technology Articles