Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN

  • Anis Nikmatul Kasanah Karang rejo Garum Blitar
  • Muladi Muladi Universitas Negeri Malang
  • Utomo Pujianto Universitas Negeri Malang
Keywords: classification, KNN, SMOTE, subjective, objective

Abstract

Amount of information in the form of online news needs to be balanced with the ability of readers to sort or classify subjective or objective news. So that a special system is needed that can be used for online news objectivity classification so that it can help readers to pick up subjective or objective news. This research proposes the development of techniques in machine learning to help sort out news objectivity automatically based on the content of the news. The algorithm proposed is K-Nearest Neighbor (KNN) algorithm. News samples obtained from kompas.com by scrapping occur imbalance classes where the number of objective news and subjective news are not balanced. So that it can affect the performance of the classification algorithm. One technique to overcome the imbalance class is to apply the Synthetic Minority Over-sampling Technique (SMOTE) technique.. SMOTE is the generation of minority data as much as the majority data. This study compares the performance of  KNN algorithm without SMOTE and the performance of KNN algorithm with SMOTE. Based on the results of the study by applying a variety of neighboring k values, namely 1, 3, 5, 7 and 9, it was found that the application of SMOTE could improve the accuracy of the KNN algorithm at values ​​k = 1 and k = 3 with an average increase of 3.36. At values ​​k 5, 7 and 9 the algorithm experiences an average decrease in accuracy of 6.67.

Downloads

Download data is not yet available.

References

[1] Y. Rizk and M. Awad, “Syntactic Genetic Algorithm for a Subjectivity Analysis of Sports Articles,” Resesrch Gate, vol. 5, no. May, p. 7, 2018.
[2] O. Sakti et al., “Klasifikasi Teks Menggunakan Algoritma K-Nearest Neighbor pada kasus kinerja pemerintah di twitter,” e-Proceeding Eng., vol. 5, no. 3, pp. 8237–8248, 2018.
[3] Ardiyansyah, P. A. Rahayuningsih, and Reza Maulana, “Analisis Perbandingan Algoritma Klasifikasi Data Mining Untuk Dataset Blogger Dengan Rapid Miner,” J. Khatulistiwa Inform., vol. VI, no. 6, pp. 20–28, 2018.
[4] A. Y. Triyanto and R. Kusumaningrum, “Implementasi Teknik Sampling untuk Mengatasi Imbalanced Data pada Penentuan Status Gizi Balita dengan Menggunakan Learning Vector Quantization,” IPTEK-KOM, vol. 19, no. 6, pp. 39–50, 2017.
[5] N. V Chawla, K. W. Bowyer, and L. O. Hall, “Handling Imbalance Data Prediksi Churn menggunakan metode SMOTE dan KNN Based on Kernel,” e-Proceeding of Engineering., vol. 4, no. 117, pp. 1-15, 2017.
[6] R. Siringoringo, “Klasifikasi Data Tidak Seimbang Menggunakan Algoritma SMOTE dan K-Nearest Neighbor,” J. ISD, vol. 3, no. 1, pp. 44–49, 2018.
[7] V. M. Rumata, “Objektivitas Berita pada Media dalam Jaringann (Analisis Isi Berita Pemilihan Gubernur DKI Jakarta pada Detik news Selama Kampanye periode 1),” no. October 2016, 2017.
[8] I. P. Sonya, “Analisis Web Scraping Untuk Data Bencana Alam Dengan Menggunakan Teknik Breadth-First,” Jurmal Inform. dan Komput., vol. 21, no. 12, pp. 69–77, 2016.
[9] A. Fathan Hidayatullah, M. Rifqi Ma, and Arif, “Penerapan Text Mining dalam Klasifikasi Judul Skripsi,” in Seminar Nasional Aplikasi Teknologi Informasi (SNATi) Agustus, 2016, pp. 1907–5022.
[10] H. Leidiyana, “Penerapan Algoritma K-Nearest Neighbor Untuk Penentuan Resiko Kredit Kepemilikan Kendaraan Bemotor,” J. Penelit. Ilmu Komputer, Syst. Embed. Log., vol. 1, no. 1, pp. 65–76, 2013.
[11] W. Gata and Purnomo, “Akurasi Text Mining Menggunakan Algoritma K-Nearest Neighbour pada Data Content SMS-Gateway,” J. Format, vol. 6, no. 5, pp. 1–5, 2017.
[12] R. I. Pristiyanti, M. A. Fauzi, and L. Muflikhah, “Sentiment Analysis Peringkasan Review Film Menggunakan Metode Information Gain dan K-Nearest Neighbor,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 3, pp. 1179–1186, 2018.
[13] U. Pujianto, “Strategi Resampling berbasis Centroid untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Perangkat Lunak,” Tekno, vol. 25, no. Maret, pp. 1–6, 2016.
[14] Wikipedia,2019.”Berita”[Online](Updated 10 Januari 2019) Tersedia di: https://id.wikipedia.org/wiki/Berita
Published
2019-08-02
Section
Artikel Rekayasa Sistem Informasi