Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification

Optimasi Nilai K pada Algoritma KNN untuk Klasifikasi Spam dan Ham Email

  • Eko Laksono Brawijaya University
  • Achmad Basuki Brawijaya University
  • Fitra Bachtiar Brawijaya University
Keywords: classification, email spam, KNN, frequency distribution clustering, k-means clustering

Abstract

There are many cases of email abuse that have the potential to harm others. This email abuse is commonly known as spam, which contains advertisements, phishing scams, and even malware. This study purpose to know the classification of email spam with ham using the KNN method as an effort to reduce the amount of spam. KNN can classify spam or ham in an email by checking it using a different K value approach. The results of the classification evaluation using confusion matrix resulted in the KNN method with a value of K = 1 having the highest accuracy value of 91.4%. From the results of the study, it is known that the optimization of the K value in KNN using frequency distribution clustering can produce high accuracy of 100%, while k-means clustering produces an accuracy of 99%. So based on the results of the existing accuracy values, the frequency distribution clustering and k-means clustering can be used to optimize the K-optimal value of the KNN in the classification of existing spam emails.

Downloads

Download data is not yet available.

References

P. Anugroho and I. Winarno, “Klasifikasi email spam dengan metode naïve bayes classifier menggunakan java programming,” ITS, pp. 1–11, 2018.

M. B. Hartanto, “Analisis dan Implementasi Pengklasifikasian Pesan Singkat pada Penyaringan SMS Spam Menggunakan Algoritma Multinomial Naïve Bayes,” e-Proceeding Eng., vol. 2, no. 2, pp. 6353–6357, 2015.

V. Christanti et al., “Perbandingan Pengklasifikasi K-Nearest Neighbor Dan Neighbor-Weighted K-Nearest Neighbor Pada Sistem Analisis Sentimen Dengan Data Microblog,” Front. J. Sains Dan Teknol., vol. 1, no. April, pp. 81–90, 2018, doi: 10.36412/frontiers/001035e1/april201801.08.

J. Ling, I. P. E. N. Kencana, and T. B. Oka, “Analisis Sentimen Menggunakan Metode Naïve Bayes Classifier Dengan Seleksi Fitur Chi Square,” E-Jurnal Mat., vol. 3, no. 3, p. 92, 2014, doi: 10.24843/mtk.2014.v03.i03.p070.

R. K. Roul, J. K. Sahoo, and K. Arora, “Modified TF-IDF Term Weighting Strategies for Text Categorization,” 2017 14th IEEE India Counc. Int. Conf. INDICON 2017, no. October, 2018, doi: 10.1109/INDICON.2017.8487593.

M. Nanja and P. Purwanto, “Metode K-Nearest Neighbor Berbasis Forward Selection Untuk Prediksi Harga Komoditi Lada,” Pseudocode, vol. 2, no. 1, pp. 53–64, 2015, doi: 10.33369/pseudocode.2.1.53-64.

A. A. Irfa, Adiwijaya, and M. S. Mubarok, “Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan k-Nearest Neighbor,” e-Proceeding Eng., vol. 5, no. 2, p. 3631, 2018.

Indrayanti, D. Sugianti, and M. A. Al Karomi, “Optimasi Parameter K Pada Algoritma K-Nearest Neighbour Untuk Klasifikasi Penyakit Diabetes Mellitus,” Pros. SNATIF Ke-4 2017, pp. 823–829, 2017, doi: 10.1007/s10115-007-0114-2.

T. Widiyaningtyas, M. Prabowo, and M. Pratama, “Implementation of K-means clustering method to distribution of high school teachers,” EECSI, pp. 1–6, 2017.

Burhanudin, Y. Musa’adah, and Y. Wihardi, “Klasifikasi Komentar Spam Pada Youtube Menggunakan Metode Naïve Bayes, Support Vector Machine, dan K-Nearest Neighbors,” J. Inform. dan Komput., vol. 3, no. 2, pp. 54–59, 2018.

D. Z. Nathania and F. A. Bachtiar, “Klasifikasi Spam Pada Twitter Menggunakan Metode Improved K-Nearest Neighbor,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 10, pp. 3948–3956, 2018.

Published
2020-04-20
How to Cite
Laksono, E., Basuki, A., & Bachtiar, F. (2020). Optimization of K Value in KNN Algorithm for Spam and Ham Email Classification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(2), 377 - 383. https://doi.org/10.29207/resti.v4i2.1845
Section
Information Technology Articles