Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques
Abstract
Financial institutions in the form of banks provide facilities in the form of credit cards, but with the development of technology, fraud on credit card transactions is still common, so a system is needed that can detect fraud transactions quickly and accurately. Therefore, this study aims to classify fraudulent transactions. The proposed method is Ensemble Learning which will be tested using the Boosting type with 3 variations, namely XGBoost, Gradient Boosting, and AdaBoost. Then, to maximize the performance of the model, the dataset used is optimized with the Synthetic Minority Oversampling Technique (SMOTE) function from the Imblearn library in the data train to handle imbalanced dataset conditions. The dataset used in this study is entitled "Credit Card Fraud Detection" with a total of 284807 data which is divided into two classes: Not Fraud and Fraud. The proposed model received a recall of 92% with Gradient Boosting, where the results increased by 10.37% compared to the previous study using Random Forest with a recall result of 81.63%. This is because the use of SMOTE in the data train greatly influences the classification of Not fraud and fraud classes.
Downloads
References
D. Tanouz, RR Subramanian, D. Eswar, GVP Reddy, AR Kumar, and CHVNM Praneeth, “Credit card fraud detection using machine learning,” Proc. - 5th Int. conf. Intell. Comput. Control System. ICICCS 2021, pp. 967–972, 2021, doi: 10.1109/ICICCS51141.2021.9432308.
A. Kurniawan and Y. Yulianingsih, "Estimating Fraud Detection on credit cards with Machine Learning," Kila, vol. 10, no. 2, pp. 320–325, 2021, doi: 10.33322/kilat.v10i2.1482.
M. Algorithm, B. Ayunda, B. Classifier, T. Na, and B. Classifier, "Classification Of Visitor Satisfaction at The Museum Using The Naïve Bayes Algorithm," pp. 89–100, 2021.
J. Brownlee, “A Gentle Introduction to Ensemble Learning Algorithms,” https://machinelearningmastery.com/, 2021. https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ (accessed Aug. 17, 2022).
S. Samsir, J. H. P. Sitorus, Zulkifli, Z. Ritonga, F. A. Nasution, and R. Watrianthos, “Comparison of machine learning algorithms for chest X-ray image COVID-19 classification,” J Phys Conf Ser, vol. 1933, no. 1, p. 012040, 2021, doi: 10.1088/1742-6596/1933/1/012040
P. Grabowicz, N. Perello, and A. Mishra, Marrying Fairness and Explainability in Supervised Learning, vol. 1, no. 1. Association for Computing Machinery, 2022.
S. Hussein, "Ensemble learning in Machine Learning: Bagging and Boosting," https://geospasialis.com/, 2021. https://geospasialis.com/ensemble-learning/ (accessed Jun. 27, 2022).
D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH): proceedings: March 20-21, 2019, Jahorina, East Sarajevo, Republic of Srpska, Bosnia and Herzegovina,” 2019 18th Int. sym. INFOTEH-JAHORINA, no. March, pp. 1–5, 2019.
JP Winkler, J. Grönberg, and A. Vogelsang, “Optimizing for recall in automatic requirements classification: An empirical study,” Proc. IEEE Int. conf Require. eng, vol. 2019-September, pp. 40–50, 2019, doi:10.109/RE.2019.00016.
MLG-ULB, “Credit Card Fraud Detection | Kaggle,” 2017. https://www.kaggle.com/mlg-ulb/creditcardfraud/data%0Ahttps://www.kaggle.com/mlg-ulb/creditcardfraud (accessed Aug. 24, 2022).
R. Firliana, R. Wulanningrum, and W. Sasongko, “Implementation of Principal Component Analysis (PCA) for Human Face Recognition,” J. Eng. , vol. 2, no. 1, pp. 65–69, 2015.
YS Aurelio, GM de Almeida, CL de Castro, and AP Braga, “Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function,” Neural Process. Lett, vol. 50, no. 2, pp. 1937–1949, 2019, doi:10.1007/s11063-018-09977-1.
J. Brownlee, “SMOTE for Imbalanced Classification with Python,” Machinelearningmastery.Com, 2020. https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ (accessed Jun. 29, 2022).
B. Karlik, A. Mohammed, Y. Bahir, and B. Koçer, “Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility,” Artic. int. J. Fuzzy Syst, no. November 2019, 2016, [Online]. Available: https://www.researchgate.net/publication/337307643.
A. Fernández, S. García, F. Herrera, and NV Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. res, vol. 61, pp. 863–905, 2018, doi:10.1613/jair.1.11192.
S. Jafari, Z. Shahbazi, YC Byun, and SJ Lee, “Lithium-Ion Battery Estimation in Online Framework Using Extreme Gradient Boosting Machine Learning Approach,” Mathematics, vol. 10, no. 6, 2022, doi:10.3390/math10060888.
N. Chakrabarty, T. Kundu, S. Dandapat, A. Sarkar, and DK Kole, Flight arrival delay prediction using gradient boosting classifier, vol. 813. Springer Singapore, 2019.
Y. Zou and C. Gao, “Extreme Learning Machine Enhanced Gradient Boosting for Credit Scoring,” Algorithms, vol. 15, no. 5, 2022, doi: 10.3390/a15050149.
F. Wang, D. Jiang, H. Wen, and H. Song, “Adaboost-based security level classification of mobile intelligent terminals,” J. Supercomput, vol. 75, no. 11, p. 7460–7478, 2019, doi: 10.1007/s11227-019-02954-y.
A. Andreyestha and A. Subekti, "Sentiment Analysis on Film Reviews Using Ensemble Learning Optimization," J. Inform, vol. 7, no. 1, pp. 15–23, 2020, doi: 10.31311/ji.v7i1.6171.
A. Shahraki, M. Abbasi, and Ø. Haugen, “Boosting algorithms for network intrusion detection: A comparative evaluation of Real AdaBoost, Gentle AdaBoost and Modest AdaBoost,” Eng. appl. Artif. Intell, vol. 94, no. February, p. 103770, 2020, doi: 10.1016/j.engappai.2020.103770.
S. Kiyohara, T. Miyata, and T. Mizoguchi, "Prediction of grain boundary structure and energy by machine learning," vol. 18, pp. 1–5, 2015, [Online]. Available: http://arxiv.org/abs/1512.03502.
DK Barupal and O. Fiehn, “Generating the blood exposome database using a comprehensive text mining and database fusion approach,” Environ. Health Perspective, vol. 127, no. 9, pp. 2825–2830, 2019, doi:10.1289/EHP4713.
J. Görtler et al, Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels, vol. 1, no. 1. Association for Computing Machinery, 2022.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;