The Impact of Feature Extraction in Random Forest Classifier for Fake News Detection

Dhani Ariatmanto; Anggi Muhammad  Rifai

doi:10.29207/resti.v8i6.6017

Dhani Ariatmanto Universitas AMIKOM Yogyakarta
Anggi Muhammad Rifai Universitas Pelita Bangsa

DOI: https://doi.org/10.29207/resti.v8i6.6017

Keywords: fake news, Random Forest, text classification, machine learning, feature extraction

Abstract

The pervasive issue of fake news spreading rapidly on online platforms. causing a concerning dissemination of misinformation. The influence of fake news has become a pressing social problem, shaping public opinion in important events such as elections. This research focuses on detecting and classifying fake news using the Random Forest algorithm by investigating the impact of feature extraction techniques on classification accuracy, this study specifically employs the TF-IDF method. For this purpose, we used 44,898 English-language articles from the ISOT fake news dataset. The dataset is cleaned using tokenization and stemming then split into 75% training and 25% testing. The TF-IDF vectorizer technique was applied to convert text into numeric as feature extraction. This study has implemented a Random Forest classifier to predict real and fake news. The proposed model contributes to overall classification precision by comparing it to the existing models. This fake news detection highlights the efficacy of the TF-IDF vectorizer and Random Forest combination which achieved an impressive accuracy rate of 99.0%. This contribution highlights an effective strategy for combating misinformation through precise text classification.

Downloads

Download data is not yet available.

References

S. Mishra, P. Shukla, and R. Agarwal, “Analyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets,” Wirel. Commun. Mob. Comput., vol. 2022, Mar. 2022, doi: 10.1155/2022/1575365.

A. Merryton and M. G. Augasta, “An Attribute-wise Attention model with BiLSTM for an efficient Fake News Detection,” Multimed. Tools Appl., vol. 83, pp. 1–18, Oct. 2023, doi: 10.1007/s11042-023-16824-6.

H. Ahmed, I. Traore, and S. Saad, Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. 2017. doi: 10.1007/978-3-319-69155-8_9.

S. Alsubari et al., “Data Analytics for the Identification of Fake Reviews Using Supervised Learning,” Comput. Mater. Contin., vol. 70, Sep. 2021, doi: 10.32604/cmc.2022.019625.

S.-Y. Lin, Y.-C. Kung, and F.-Y. Leu, “Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis,” Inf. Process. Manag., vol. 59, p. 102872, Mar. 2022, doi: 10.1016/j.ipm.2022.102872.

P. Meel and D. Vishwakarma, “A Temporal Ensembling based Semi-supervised ConvNet for the Detection of Fake News Articles,” Expert Syst. Appl., vol. 177, p. 115002, Apr. 2021, doi: 10.1016/j.eswa.2021.115002.

A. Noor, R. Gernowo, and O. Nurhayati, “Data Augmentation for Hoax Detection through the Method of Convolutional Neural Network in Indonesian News,” J. Penelit. Pendidik. IPA, vol. 9, pp. 5078–5084, Jul. 2023, doi: 10.29303/jppipa.v9i7.4214.

S. S. Dhanyal and S. Nandyal, “An Effective Machine Learning-based Segmentation and Feature Extraction Technique for Muscular-Disorder,” in 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN), 2023, pp. 475–482. doi: 10.1109/ICPCSN58827.2023.00083.

M. J. Awan et al., “Fake News Data Exploration and Analytics,” Electronics, vol. 10, no. 19. 2021. doi: 10.3390/electronics10192326.

M. Aufar, R. Andreswari, and D. Pramesti, “Sentiment Analysis on Youtube Social Media Using Decision Tree and Random Forest Algorithm: A Case Study,” in 2020 International Conference on Data Science and Its Applications (ICoDSA), 2020, pp. 1–7. doi: 10.1109/ICoDSA50139.2020.9213078.

P. T. Putra, A. Anggrawan, and H. Hairani, “Comparison of Machine Learning Methods for Classifying User Satisfaction Opinions of the PeduliLindungi Application,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 3 SE-Articles, Jun. 2023, doi: https://doi.org/10.30812/matrik.v22i3.2860.

M. Ula, V. Ilhadi, and Z. Sidek, “Comparing Long Short-Term Memory and Random Forest Accuracy for Bitcoin Price Forecasting,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 23, pp. 259–272, Jan. 2024, doi: 10.30812/matrik.v23i2.3267.

I. Ennejjai, A. Ariss, N. Kharmoum, W. Rhalem, S. Ziti, and M. Ezziyyani, “Artificial Intelligence for Fake News BT - International Conference on Advanced Intelligent Systems for Sustainable Development,” J. Kacprzyk, M. Ezziyyani, and V. E. Balas, Eds., Cham: Springer Nature Switzerland, 2023, pp. 77–91.

M. Alkaff, M. Miqdad, M. Fachrurrazi, M. Abdi, A. Abidin, and R. Amalia, “Hate Speech Detection for Banjarese Languages on Instagram Using Machine Learning Methods,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 3 SE-Articles, Jul. 2023, doi: https://doi.org/10.30812/matrik.v22i3.2939.

D. K. Sharma, S. Garg, and P. Shrivastava, “Evaluation of Tools and Extension for Fake News Detection,” in 2021 International Conference on Innovative Practices in Technology and Management (ICIPTM), 2021, pp. 227–232. doi: 10.1109/ICIPTM52218.2021.9388356.

University of Victoria, “The ISOT Fake News dataset is a compilation of several thousands fake news and truthful articles,” Canada, 2022. [Online]. Available: https://onlineacademiccommunity.uvic.ca/isot/?utm_medium=redirect&utm_source=%2Fdatasets%2Ffake-news%2Findex.php&utm_campaign=redirect-usage

K. Muthuvelu, S. Rajkumar, and R. Bhuvanya, “Fake News Detection Using Machine Learning Algorithms,” 2022, pp. 181–207. doi: 10.1002/9781119763499.ch10.

S. Khomsah, “Sentiment Analysis On YouTube Comments Using Word2Vec and Random Forest,” Telematika, vol. 18, p. 61, Mar. 2021, doi: 10.31315/telematika.v18i1.4493.

P. Karthika, R. Murugeswari, and R. Manoranjithem, “Sentiment Analysis of Social Media Network Using Random Forest Algorithm,” in 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing (INCOS), 2019, pp. 1–5. doi: 10.1109/INCOS45849.2019.8951367.

M. Azizah, A. Yanuar, and F. Firdayani, “Dimensional Reduction of QSAR Features Using a Machine Learning Approach on the SARS-Cov-2 Inhibitor Database,” J. Penelit. Pendidik. IPA, vol. 8, no. 6 SE-Research Articles, pp. 3095–3101, Dec. 2022, doi: 10.29303/jppipa.v8i6.2432.

S. Khomsah, R. D. Ramadhani, and S. Wijaya, “The Accuracy Comparison Between Word2Vec and FastText On Sentiment Analysis of Hotel Reviews,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 3 SE-Information Systems Engineering Articles, Jun. 2022, doi: 10.29207/resti.v6i3.3711.

A. Setiawan, E. Utami, and D. Ariatmanto, “Cattle Weight Estimation Using Linear Regression and Random Forest Regressor,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 8, no. 1 SE-Information Systems Engineering Articles, Feb. 2024, doi: 10.29207/resti.v8i1.5494.

E. Fitri, “Analisis Sentimen Terhadap Aplikasi Ruangguru Menggunakan Algoritma Naive Bayes, Random Forest Dan Support Vector Machine,” J. Transform., vol. 18, p. 71, Jul. 2020, doi: 10.26623/transformatika.v18i1.2317.

B. Bahrawi, “SENTIMENT ANALYSIS USING RANDOM FOREST ALGORITHM ONLINE SOCIAL MEDIA BASED,” vol. 2, p. h.29-33, Dec. 2019.