Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing
Comparative Analysis of Optimization Algorithms in Random Forest for Classification of Bank Marketing Data
The world of banking requires a marketer to be able to reduce the risk of borrowing by keeping his customers from occurring non-performing loans. One way to reduce this risk is by using data mining techniques. Data mining provides a powerful technique for finding meaningful and useful information from large amounts of data by way of classification. The classification algorithm that can be used to handle imbalance problems can use the Random Forest (RF) algorithm. However, several references state that an optimization algorithm is needed to improve the classification results of the RF algorithm. Optimization of the RF algorithm can be done using Bagging and Genetic Algorithm (GA). This study aims to classify Bank Marketing data in the form of loan application receipts, which data is taken from the www.data.world site. Classification is carried out using the RF algorithm to obtain a predictive model for loan application acceptance with optimal accuracy. This study will also compare the use of optimization in the RF algorithm with Bagging and Genetic Algorithms. Based on the tests that have been done, the results show that the most optimal performance of the classification of Bank Marketing data is by using the RF algorithm with an accuracy of 88.30%, AUC (+) of 0.500 and AUC (-) of 0.000. The optimization of Bagging and Genetic Algorithm has not been able to improve the performance of the RF algorithm for classification of Bank Marketing data.
A. T. Rahmawati, M. Saifi and R. R. Hidayat, "Analisis Keputusan Pemberian Kredit dalam Langkah Meminimalisir Kredit Bermasalah," Jurnal Administrasi Bisnis, vol. 35, no. 1, pp. 179-186, 2016.
S. Somadiyono and T. Tresya, "Tanggung Jawab Pidana Marketing Menurut Undang Undang Perbankan Terhadap Pembiayaan Bermasalah di Bank Muamalat Indonesia,Tbk," Jurnal Lex Specialis, vol. 21, pp. 22-38, 2015.
S. Masripah, "Komparasi Algoritma Klasifikasi Data Mining untuk Evaluasi Pemberian Kredit," Bina Insani ICT Journal, vol. 3, no. 1, pp. 187-193, 2016.
W. Gan, J. C.-W. C. H.-C. Lin and J. Zhan, "Data mining in Distributed Environment: A Survey," Wiley Interdiscriplinary Reviews: Data Mining and Knowledge Discovery , vol. 7, no. 6, pp. 1-19, 2017.
S. Umadevi and K. S. J. Marseline, "A Survey on Data Mining Classification Algorithms," in International Conference on Signal Processing and Communication, Coimbatore, India, 2017.
"Data.World," Data.World, Inc, 2016. [Online]. Available: https://data.world/uci/bank-marketing. [Accessed 1 Desember 2020].
A. S. More and D. P. Rana, "Review of Random Forest Classification Techniques to Resolve Data Imbalance," in International Conference on Intelligent Systems and Information Management, Aurangabad, India, 2017.
A. Parmar, R. Katariya and V. Patel, "A Review on Random Forest: An Ensemble Classifier," in International Conference on Intelligent Data Communication Technologies and Internet of Things, Springer, Cham, 2018.
W. Chen, X. Xie, B. Pradhan, H. Hong, D. T. Bui, Z. Duan and J. Ma, "A Comparative Study of Logistic Model Tree, Random Forest, and Classification and Regression Tree Models for Spatial Prediction of Landslide Susceptibility," Catena , vol. 151 , pp. 147-160, 2017.
F. Burger and J. Pauli, "Understanding the Interplay of Simultaneous Model Selection and Representation Optimization for Classification Tasks," in International Conference on Pattern Recognition Applications and Methods, Lisbon, Portugal, 2016.
A. P. D. Silva, "Optimization Approaches to Supervised Classification," European Journal of Operational Research, vol. 261, no. 2, pp. 772-788, 2017.
E. Elyan and M. M. Gaber, "A Genetic Algorithm Approach to Optimising Random Forests Applied to Class Engineered Data," Information Sciences, vol. 384, no. 1, pp. 220-234, 2017.
A., Arfiani, Z. and Rustam, "Ovarian Cancer Data Classification Using Bagging and Random Forest," in AIP Conference Proceedings, Depok, 2019.
S. A. Naghibi, K. Ahmadi and A. Daneshi, "Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping," Water Resour Manage, vol. 31, no. 9, p. 2761–2775, 2017.
V. Chaurasia and S. Pal, "Data Mining Approach to Detect Heart Dieses," International Journal of Advanced Computer Science and Information Technology, vol. 2, no. 4, pp. 56-66, 2013.
L. Bieman, "Bagging Predictors," Machine Learning, vol. 24, pp. 123-140, 1996.
S. E. Roshan and S. Asadi, "Improvement of Bagging Performance for Classification of Imbalanced Datasets Using Evolutionary Multi-objective Optimization," Engineering Applications of Artificial Intelligence, vol. 87, pp. 1-19, 2020.
A. S. Wicaksono and A. A. Supianto, "Hyper Parameter Optimization using Genetic Algorithm on Machine Learning Methods for Online News Popularity Prediction," International Journal of Advanced Computer Science and Applications, vol. 9, no. 12, pp. 263-267, 2018.
P. Saqib, U. Qamar, A. Aslam and A. Ahmad, "Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction," in Advances in Intelligent Systems and Computing, Springer, Cham, 2019.
Y. Grichi, Y. Beauregard and T. M. Dao, "Optimization of Obsolescence Forcasting Using New Hybrid Approach Based on The RF Method and The Meta-heuristic Genetic Algorithm," American Journal of Management, vol. 18, no. 2, pp. 27-38, 2018.
J. Huang and C. X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 299-310, 2013.
L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
C. Yoo, D. Han, J. Ima and B. Bechtel, "Comparison Between Convolutional Neural Networks and Random Forest for Local Climate Zone Classification in Mega Urban Areas Using Landsat Images," Journal of Photogrammetry and Remote Sensing, vol. 157, pp. 155-170, 2019.
J. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng and K. Li, "A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment," IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. 919-933, 2017.
E. Alfaro, M. Gamez and N. García, "An R Package for Classification with Boosting and Bagging," Journal of Statistical Software, vol. 54, no. 32, pp. 11-35, 2013.
B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, New York: Chapman & Hall., 1993.
L. Hakim, B. Sartono and A. Saefuddin, "Bagging Based Ensemble Classification Method on Imbalance Datasets," International Journal of Computer Science and Network, vol. 6, no. 6, pp. 670-676, 2017.
A. J. Olalekan, F. Ogwueleka and P. O. Odion, "Effective and Accurate Bootstrap Aggregating (Bagging) Ensemble Algorithm Model for Prediction and Classification of Hypothyroid Disease," International Journal of Computer Applications, vol. 176, no. 39, pp. 40-48, 2020.
K. Oliver, "Genetic Algorithms," in Genetic Algorithm Essentials , Springer, Cham, 2017, pp. 11-19.
E. Habibi, M. Salehi, G. Yadegarfar and A. Taheri, "Optimization of ANFIS Using A Genetic Algorithm for Physical Work Rate Classification," International Journal of Occupational Safety and Ergonomics, vol. 26, no. 3, pp. 436-443, 2020.
A. A. M. Lima, F. K. H. Barros, V. H. Yoshizumi, D. H. Spatti and M. E. Dajer, "Optimized Artifcial Neural Network for Biosignals Classifcation Using Genetic Algorithm," Journal of Control, Automation and Electrical Systems, vol. 30, p. 371–379, 2019.
A. Malik, "A Study of Genetic Algorithm and Crossover Techniques," International Journal of Computer Science and Mobile Computing, vol. 8, no. 3, pp. 335-344, 2019.
J. Lin, H. Chen, S. Li, Y. Liu, X. Li and B. Yu, "Accurate Prediction of Potential Druggable Proteins Based on Genetic Algorithm and Bagging-SVM Ensemble Classifier," Artificial Intelligence In Medicine, vol. 98, pp. 35-47, 2019.
T. Shi, G. He and Y. Mu, "Random Forest Algorithm Based on Genetic Algorithm Optimization for Property-Related Crime Prediction," in International Conference on Computer, Network, Communication and Information Systems, Atlantis Press, 2019.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Hak cipta pada setiap artikel adalah milik penulis.
- Penulis mengakui bahwa Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) sebagai publisher yang mempublikasikan pertama kali dengan lisensi Creative Commons Attribution 4.0 International License.
- Penulis dapat memasukan tulisan secara terpisah, mengatur distribusi non-ekskulif dari naskah yang telah terbit di jurnal ini kedalam versi yang lain (misal: dikirim ke respository institusi penulis, publikasi kedalam buku, dll), dengan mengakui bahwa naskah telah terbit pertama kali pada Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ;