Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method

Identifikasi Berita Palsu (Hoax) pada Media Sosial Twitter dengan Metode Decision Tree C4.5

  • Brenda Irena Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: hoax, twitter, decision tree C4.5, TF-IDF, information gain


Social media is a means to communicate and exchange information between people, and Twitter is one of them. But the information disseminated is not entirely true, but there is some news that is not in accordance with the truth or often called hoaxes. There have been many cases of spreading hoaxes that cause concern and often harm a particular individual or group. So in this research, the authors build a system to identify hoax news on social media Twitter using the Decision Tree C4.5 classification method to the 50,610 tweet data. What distinguishes this research from some researches before is the existence of several test scenarios, classification only, classification using weighting feature, and also classification using weighting feature and feature selection. The weighting method used is TF-IDF, and the feature selection uses Information Gain. The features used are also generated using n-grams consisting of unigram, bigram, and also trigrams. The final results show that the classification test that uses weighting feature and feature selection produces the best accuracy of 72.91% with a ratio of 90% training data and 10% test data (90:10) and the number of features used is 5000 in unigram features.






Irena, B., & Erwin Budi Setiawan. (2020). Fake News (Hoax) Identification on Social Media Twitter using Decision Tree C4.5 Method. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(4), 711 - 716. https://doi.org/10.29207/resti.v4i4.2125
