Perbandingan Metode TF-ABS dan TF-IDF Pada Klasifikasi Teks Helpdesk Menggunakan K-Nearest Neighbor
Abstract
Distribution of tickets to the destination unit is a very important function in the helpdesk application, but the process of distributing tickets manually by admin officers has drawbacks, namely ticket distribution errors can occur and increase ticket completion time if the number of tickets is large. Helpdesk text classification becomes important to automatically distribute tickets to the appropriate destination units in a short time. This study was conducted to compare the performance of helpdesk text classification at the Directorate General of State Assets of the Ministry of Finance using the K-Nearest Neighbor (KNN) method with the TF-ABS and TF-IDF weighting methods. The research was conducted by collecting complaint documents, preprocessing, word weighting, feature reduction, classification, and testing. Classification using KNN with parameters n_neighbor (k) namely k=1, k=3, k=5, k=7, k=9, k=11, k=13, k=15, k=17, and k=19 to classify 10,537 helpdesk texts into 8 categories. The test uses a confusion matrix based on the accuracy value and score-f1. The test results show that the TF-ABS weighting method is better than TF-IDF with the highest accuracy value of 90.04% at 15% and k=3.
Downloads
References
R. Feldman and J. Sanger, The Text Mining Handbook. 2006.
A. Kulkarni and A. Shivananda, Natural Language Processing Recipes. New York, USA: Apress, 2019.
Okfalisa, I. Gazalba, Mustakim, and N. G. I. Reza, “Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification,” Proc. - 2017 2nd Int. Conf. Inf. Technol. Inf. Syst. Electr. Eng. ICITISEE 2017, vol. 2018-Janua, pp. 294–298, 2018, doi: 10.1109/ICITISEE.2017.8285514.
C. F. Suharno, M. A. Fauzi, and R. S. Perdana, “Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors dan Chi-Square,” Syst. Inf. Syst. Informatics J., vol. 03, no. 01, pp. 25–32, 2017.
A. Indriani, “Analisa Perbandingan Metode Naïve Bayes Classifier Dan K-Nearest Neighbor Terhadap Klasifikasi Data,” Sebatik, vol. 24, no. 1, pp. 1–7, 2020, doi: 10.46984/sebatik.v24i1.909.
L. A. Utami, “Melalui Komparasi Algoritma Support Vector Machine Dan K-Nearest Neighbor Berbasis Particle Swarm Optimization,” vol. 13, no. 1, pp. 103–112, 2017.
L. D. Utami, “Komparasi Algoritma Klasifikasi Pada Analisis Review Hotel,” J. Pilar Nusa Mandiri, vol. 14, no. 2, p. 261, 2018, doi: 10.33480/pilar.v14i2.1023.
M. ALTINTAġ and A. C. TANTUĞ, “Machine learning based software development,” vol. 21, no. 3, pp. 33–44, 2014.
M. A. Kurniawan, Y. Sibaroni, and K. L. Muslim, “Kategorisasi Berita Menggunakan Metode Pembobotan TF.ABS dan TF.CHI,” Indones. J. Comput., vol. 3, no. 2, p. 83, 2018, doi: 10.21108/indojc.2018.3.2.236.
M. Lan, C. L. Tan, J. Su, and Y. Lu, “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, 2009, doi: 10.1109/TPAMI.2008.110.
F. Debole and F. Sebastiani, “Supervised Term Weighting for Automated Text Categorization,” in Supervised Term Weighting for Automated Text Categorization, 2003, no. December 2015, doi: 10.1145/952686.952688.
N. G. Yudiarta, M. Sudarma, and W. G. Ariastina, “Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data,” Maj. Ilm. Teknol. Elektro, vol. 17, no. 3, p. 339, 2018, doi: 10.24843/mite.2018.v17i03.p06.
P. Bafna, D. Pramod, and A. Vaidya, “Document clustering: TF-IDF approach,” Int. Conf. Electr. Electron. Optim. Tech. ICEEOT 2016, no. November, pp. 61–66, 2016, doi: 10.1109/ICEEOT.2016.7754750.
L. A. Matsunaga and N. F. F. Ebecken, “Two Novel Weighting for Text Categorization,” in Data Mining IX - Data Mining, Protection, Detection and other Security Technologies, IX., A. Zanasi, D. Almorza Gomar, N. F. . Ebecken, and C. . Brebbia, Eds. Rio de Janeiro, Brazil: WITPRESS, 2008, pp. 105–114.
J. Li et al., “Feature selection: A data perspective,” ACM Comput. Surv., vol. 50, no. 6, 2017, doi: 10.1145/3136625.
J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques - third edition. 2012.
D. Yuliana and C. Supriyanto, “Klasifikasi Teks Pengaduan Masyarakat Dengan Menggunakan Algoritma Neural Network,” UPI YPTK J. KomTekInfo, vol. 5, no. 3, pp. 92–116, 2019.
L. A. Andika, P. A. N. Azizah, and R. Respatiwulan, “Analisis Sentimen Masyarakat terhadap Hasil Quick Count Pemilihan Presiden Indonesia 2019 pada Media Sosial Twitter Menggunakan Metode Naive Bayes Classifier,” Indones. J. Appl. Stat., vol. 2, no. 1, p. 34, 2019, doi: 10.13057/ijas.v2i1.29998.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;