Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification
Optimasi Nilai K pada Algoritma KNN untuk Klasifikasi Spam dan Ham Email
Abstract
There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.
Downloads
References
P. Anugroho and I. Winarno, “Klasifikasi email spam dengan metode naïve bayes classifier menggunakan java programming,” ITS, pp. 1–11, 2018.
M. B. Hartanto, “Analisis dan Implementasi Pengklasifikasian Pesan Singkat pada Penyaringan SMS Spam Menggunakan Algoritma Multinomial Naïve Bayes,” e-Proceeding Eng., vol. 2, no. 2, pp. 6353–6357, 2015.
V. Christanti et al., “Perbandingan Pengklasifikasi K-Nearest Neighbor Dan Neighbor-Weighted K-Nearest Neighbor Pada Sistem Analisis Sentimen Dengan Data Microblog,” Front. J. Sains Dan Teknol., vol. 1, no. April, pp. 81–90, 2018, doi: 10.36412/frontiers/001035e1/april201801.08.
J. Ling, I. P. E. N. Kencana, and T. B. Oka, “Analisis Sentimen Menggunakan Metode Naïve Bayes Classifier Dengan Seleksi Fitur Chi Square,” E-Jurnal Mat., vol. 3, no. 3, p. 92, 2014, doi: 10.24843/mtk.2014.v03.i03.p070.
R. K. Roul, J. K. Sahoo, and K. Arora, “Modified TF-IDF Term Weighting Strategies for Text Categorization,” 2017 14th IEEE India Counc. Int. Conf. INDICON 2017, no. October, 2018, doi: 10.1109/INDICON.2017.8487593.
M. Nanja and P. Purwanto, “Metode K-Nearest Neighbor Berbasis Forward Selection Untuk Prediksi Harga Komoditi Lada,” Pseudocode, vol. 2, no. 1, pp. 53–64, 2015, doi: 10.33369/pseudocode.2.1.53-64.
A. A. Irfa, Adiwijaya, and M. S. Mubarok, “Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan k-Nearest Neighbor,” e-Proceeding Eng., vol. 5, no. 2, p. 3631, 2018.
Indrayanti, D. Sugianti, and M. A. Al Karomi, “Optimasi Parameter K Pada Algoritma K-Nearest Neighbour Untuk Klasifikasi Penyakit Diabetes Mellitus,” Pros. SNATIF Ke-4 2017, pp. 823–829, 2017, doi: 10.1007/s10115-007-0114-2.
T. Widiyaningtyas, M. Prabowo, and M. Pratama, “Implementation of K-means clustering method to distribution of high school teachers,” EECSI, pp. 1–6, 2017.
Burhanudin, Y. Musa’adah, and Y. Wihardi, “Klasifikasi Komentar Spam Pada Youtube Menggunakan Metode Naïve Bayes, Support Vector Machine, dan K-Nearest Neighbors,” J. Inform. dan Komput., vol. 3, no. 2, pp. 54–59, 2018.
D. Z. Nathania and F. A. Bachtiar, “Klasifikasi Spam Pada Twitter Menggunakan Metode Improved K-Nearest Neighbor,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 10, pp. 3948–3956, 2018.
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;