Analisis Optimasi Algoritma Klasifikasi Naive Bayes menggunakan Genetic Algorithm dan Bagging

Agung Nugroho; Yoga Religia

doi:10.29207/resti.v5i3.3067

Agung Nugroho Universitas Pelita Bangsa
Yoga Religia Universitas Pelita Bangsa

DOI: https://doi.org/10.29207/resti.v5i3.3067

Keywords: Classification, Bank Marketing, Naive Bayes, Bagging, Genetic Algorithm

Abstract

The increasing demand for credit applications to banks has motivated the banking world to switch to more sophisticated techniques for analyzing the level of credit risk. One technique for analyzing the level of credit risk is the data mining approach. Data mining provides a technique for finding meaningful information from large amounts of data by way of classification. However, bank marketing data is a type of imbalance data so that if the classification is done the results are less than optimal. The classification algorithm that can be used for imbalance data types can use naïve Bayes. Naïve Bayes performs well in terms of classification. However, optimization is needed in order to obtain more optimal classification results. Optimization techniques in handling imbalance data have been developed with several approaches. Bagging and Genetic Algorithms can be used to overcome imbalance data. This study aims to compare the accuracy level of the naïve Bayes algorithm after optimization using the bagging and genetic algorithm. The results showed that the combination of bagging and a genetic algorithm could improve the performance of Naive Bayes by 4.57%.

Downloads

Download data is not yet available.

References

N. Hadinata, “Implementasi Metode Multi Attribute Utility Theory (MAUT) Pada Sistem Pendukung Keputusan dalam Menentukan Penerima Kredit,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 7, no. 2, Sep. 2018, doi: 10.32736/sisfokom.v7i2.562.

I. Sugiyarto, “Perbandingan Kinerja Algoritma Data Mining Prediksi Persetujuan Kartu Kredit,” Fakt. Exacta, vol. 12, no. 3, pp. 180–192, 2019, doi: 10.30998/faktorexacta.v12i3.4310.

S. Somadiyono and T. Tresya, “Tanggung Jawab Pidana Marketing Menurut Undang Undang Perbankan Terhadap Pembiayaan Bermasalah Di Bank Muamalat Indonesia,Tbk,” J. LEX Spec., no. 21, pp. 22–38, 2015.

S. Masripah, “Komparasi Algoritma Klasifikasi Data Mining untuk Evaluasi Pemberian Kredit,” BINA Insa. ICT J., vol. 3, no. 1, pp. 187–193, 2016 [Online]. Available: http://ejournal-binainsani.ac.id/index.php/BIICT/article/view/815. [Accessed: 23-Dec-2020]

H. Leidiyana, “Penerapan Algoritma K-Nearest Neighbor Untuk Penentuan Resiko Kredit Kepemilikan Kendaraan Bemotor,” 2013 [Online]. Available: http://jurnal.unismabekasi.ac.id/index.php/piksel/article/view/293. [Accessed: 23-Dec-2020]

P. Singh and N. Singh, “Role of Data Mining Techniques in Bioinformatics,” Int. J. Appl. Res. Bioinforma., vol. 11, no. 1, Jan. 2021, doi: 10.4018/IJARB.2021010106.

S. Umadevi and K. S. J. Marseline, “A survey on data mining classification algorithms,” in 2017 International Conference on Signal Processing and Communication (ICSPC), 2017, doi: 10.1109/CSPC.2017.8305851.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Elsevier, 2012.

A. Verma, “Evaluation of Classification Algorithms with Solutions to Class Imbalance Problem on Bank Marketing Dataset using WEKA,” Int. Res. J. Eng. Technol., 2019 [Online]. Available: www.irjet.net

Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, Feb. 2021, doi: 10.29207/resti.v5i1.2813.

M. R. Longadge, M. Snehlata, S. Dongre, and D. Latesh Malik, “Class Imbalance Problem in Data Mining: Review,” 2013 [Online]. Available: www.ijcsn.org

R. S. Wahono, “Integrasi SMOTE dan Information Gain pada Naive Bayes untuk Prediksi Cacat Software,” 2015 [Online]. Available: http://journal.ilmukomputer.org

I. G. A. Socrates, A. L. Akbar, M. S. Akbar, A. Z. Arifin, and D. Herumurti, “Optimasi Naive Bayes Dengan Pemilihan Fitur Dan Pembobotan Gain Ratio,” Lontar Komput. J. Ilm. Teknol. Inf., Mar. 2016, doi: 10.24843/LKJITI.2016.v07.i01.p03.

O. Somantri and M. Khambali, “Feature Selection Klasifikasi Kategori Cerita Pendek Menggunakan Naïve Bayes dan Algoritme Genetika,” J. Nas. Tek. Elektro dan Teknol. Inf., vol. 6, no. 3, Sep. 2017, doi: 10.22146/jnteti.v6i3.332. [Online]. Available: http://ejnteti.jteti.ugm.ac.id/index.php/JNTETI/article/view/332

C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, “Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers under Imbalanced Data Sets,” IEEE Access, vol. 8, pp. 2122–2133, 2020, doi: 10.1109/ACCESS.2019.2961784.

Youqin Pan and Zaiyong Tang, “Ensemble methods in bank direct marketing,” in 2014 11th International Conference on Service Systems and Service Management (ICSSSM), 2014, doi: 10.1109/ICSSSM.2014.6874056.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man, Cybern. Part C (Applications Rev., vol. 42, no. 4, Jul. 2012, doi: 10.1109/TSMCC.2011.2161285.

R. S. Wahono and N. Suryana, “Combining particle swarm optimization based feature selection and bagging technique for software defect prediction,” Int. J. Softw. Eng. its Appl., vol. 7, no. 5, pp. 153–166, 2013, doi: 10.14257/ijseia.2013.7.5.16.

J. Ha and J. S. Lee, “A new under-sampling method using genetic algorithm for imbalanced data classification,” in ACM IMCOM 2016: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, 2016, doi: 10.1145/2857546.2857643.

V. Karia, W. Zhang, A. Naeim, and R. Ramezani, “GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets,” Oct. 2019 [Online]. Available: http://arxiv.org/abs/1910.10806. [Accessed: 30-Mar-2021]

S. Moro, P. Cortez, and P. Rita, “A data-driven approach to predict the success of bank telemarketing,” Decis. Support Syst., vol. 62, pp. 22–31, Jun. 2014, doi: 10.1016/j.dss.2014.03.001. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S016792361400061X

A. Verma, “Study and Evaluation of Classification Algorithms In Data Mining,” Int. Res. J. Eng. Technol., 2018.

B. K. Khotimah, M. Miswanto, and H. Suprajitno, “Optimization of feature selection using genetic algorithm in naïve Bayes classification for incomplete data,” Int. J. Intell. Eng. Syst., vol. 13, no. 1, pp. 334–343, Feb. 2020, doi: 10.22266/ijies2020.0229.31.

G. Liang, X. Zhu, and C. Zhang, “The effect of varying levels of class distribution on bagging for different algorithms: An empirical study,” Int. J. Mach. Learn. Cybern., vol. 5, no. 1, Feb. 2014, doi: 10.1007/s13042-012-0125-5.

L. Breiman, “Bagging Predictors,” Kluwer Academic Publishers, 1996.

E. Alfaro, M. Gámez, and N. García, “adabag: An R Package for Classification with Boosting and Bagging,” J. Stat. Softw., vol. 54, no. 2, 2013, doi: 10.18637/jss.v054.i02.

O. Kramer, “Genetic Algorithms,” 2017.

D. Jorge Martins Sousa, R. Fuentecilla Maia Ferreira Neves Nuno Cavaco Gomes Horta, A. Manuel Raminhos Cordeiro Grilo Supervisor, and R. Fuentecilla Maia Ferreira Neves, “Using Naïve Bayes and Genetic Algorithms to Find Influential Twitter Users to Forecast the S&P 500 Electrical and Computer Engineering Examination Committee,” 2017.

A. Wibowo, “10 Fold Cross Validation,” 2017. [Online]. Available: https://mti.binus.ac.id/2017/11/24/10-fold-cross-validation. [Accessed: 23-Dec-2020]