Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN
Abstract
Amount of information in the form of online news needs to be balanced with the ability of readers to sort or classify subjective or objective news. So that a special system is needed that can be used for online news objectivity classification so that it can help readers to pick up subjective or objective news. This research proposes the development of techniques in machine learning to help sort out news objectivity automatically based on the content of the news. The algorithm proposed is K-Nearest Neighbor (KNN) algorithm. News samples obtained from kompas.com by scrapping occur imbalance classes where the number of objective news and subjective news are not balanced. So that it can affect the performance of the classification algorithm. One technique to overcome the imbalance class is to apply the Synthetic Minority Over-sampling Technique (SMOTE) technique.. SMOTE is the generation of minority data as much as the majority data. This study compares the performance of KNN algorithm without SMOTE and the performance of KNN algorithm with SMOTE. Based on the results of the study by applying a variety of neighboring k values, namely 1, 3, 5, 7 and 9, it was found that the application of SMOTE could improve the accuracy of the KNN algorithm at values k = 1 and k = 3 with an average increase of 3.36. At values k 5, 7 and 9 the algorithm experiences an average decrease in accuracy of 6.67.
Downloads
References
[2] O. Sakti et al., “Klasifikasi Teks Menggunakan Algoritma K-Nearest Neighbor pada kasus kinerja pemerintah di twitter,” e-Proceeding Eng., vol. 5, no. 3, pp. 8237–8248, 2018.
[3] Ardiyansyah, P. A. Rahayuningsih, and Reza Maulana, “Analisis Perbandingan Algoritma Klasifikasi Data Mining Untuk Dataset Blogger Dengan Rapid Miner,” J. Khatulistiwa Inform., vol. VI, no. 6, pp. 20–28, 2018.
[4] A. Y. Triyanto and R. Kusumaningrum, “Implementasi Teknik Sampling untuk Mengatasi Imbalanced Data pada Penentuan Status Gizi Balita dengan Menggunakan Learning Vector Quantization,” IPTEK-KOM, vol. 19, no. 6, pp. 39–50, 2017.
[5] N. V Chawla, K. W. Bowyer, and L. O. Hall, “Handling Imbalance Data Prediksi Churn menggunakan metode SMOTE dan KNN Based on Kernel,” e-Proceeding of Engineering., vol. 4, no. 117, pp. 1-15, 2017.
[6] R. Siringoringo, “Klasifikasi Data Tidak Seimbang Menggunakan Algoritma SMOTE dan K-Nearest Neighbor,” J. ISD, vol. 3, no. 1, pp. 44–49, 2018.
[7] V. M. Rumata, “Objektivitas Berita pada Media dalam Jaringann (Analisis Isi Berita Pemilihan Gubernur DKI Jakarta pada Detik news Selama Kampanye periode 1),” no. October 2016, 2017.
[8] I. P. Sonya, “Analisis Web Scraping Untuk Data Bencana Alam Dengan Menggunakan Teknik Breadth-First,” Jurmal Inform. dan Komput., vol. 21, no. 12, pp. 69–77, 2016.
[9] A. Fathan Hidayatullah, M. Rifqi Ma, and Arif, “Penerapan Text Mining dalam Klasifikasi Judul Skripsi,” in Seminar Nasional Aplikasi Teknologi Informasi (SNATi) Agustus, 2016, pp. 1907–5022.
[10] H. Leidiyana, “Penerapan Algoritma K-Nearest Neighbor Untuk Penentuan Resiko Kredit Kepemilikan Kendaraan Bemotor,” J. Penelit. Ilmu Komputer, Syst. Embed. Log., vol. 1, no. 1, pp. 65–76, 2013.
[11] W. Gata and Purnomo, “Akurasi Text Mining Menggunakan Algoritma K-Nearest Neighbour pada Data Content SMS-Gateway,” J. Format, vol. 6, no. 5, pp. 1–5, 2017.
[12] R. I. Pristiyanti, M. A. Fauzi, and L. Muflikhah, “Sentiment Analysis Peringkasan Review Film Menggunakan Metode Information Gain dan K-Nearest Neighbor,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 3, pp. 1179–1186, 2018.
[13] U. Pujianto, “Strategi Resampling berbasis Centroid untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Perangkat Lunak,” Tekno, vol. 25, no. Maret, pp. 1–6, 2016.
[14] Wikipedia,2019.”Berita”[Online](Updated 10 Januari 2019) Tersedia di: https://id.wikipedia.org/wiki/Berita
Copyright (c) 2019 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;