Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques

Moh. Badris Sholeh Rahmatullah; Aulia Ligar Salma Hanani; Akmal Muhammad Naim; Zamah Sari; Yufis Azhar

doi:10.29207/resti.v6i6.4213

Moh. Badris Sholeh Rahmatullah Universitas Muhammadiyah Malang
Aulia Ligar Salma Hanani Universitas Muhammadiyah Malang
Akmal Muhammad Naim Universitas Muhammadiyah Malang
Zamah Sari Universitas Muhammadiyah Malang https://orcid.org/0000-0002-1247-2414
Yufis Azhar Universitas Muhammadiyah Malang

DOI: https://doi.org/10.29207/resti.v6i6.4213

Keywords: machine learning, ensemble learning, classification, resampling, credit card fraud

Abstract

Financial institutions in the form of banks provide facilities in the form of credit cards, but with the development of technology, fraud on credit card transactions is still common, so a system is needed that can detect fraud transactions quickly and accurately. Therefore, this study aims to classify fraudulent transactions. The proposed method is Ensemble Learning which will be tested using the Boosting type with 3 variations, namely XGBoost, Gradient Boosting, and AdaBoost. Then, to maximize the performance of the model, the dataset used is optimized with the Synthetic Minority Oversampling Technique (SMOTE) function from the Imblearn library in the data train to handle imbalanced dataset conditions. The dataset used in this study is entitled "Credit Card Fraud Detection" with a total of 284807 data which is divided into two classes: Not Fraud and Fraud. The proposed model received a recall of 92% with Gradient Boosting, where the results increased by 10.37% compared to the previous study using Random Forest with a recall result of 81.63%. This is because the use of SMOTE in the data train greatly influences the classification of Not fraud and fraud classes.

Downloads

Download data is not yet available.

References

D. Tanouz, RR Subramanian, D. Eswar, GVP Reddy, AR Kumar, and CHVNM Praneeth, “Credit card fraud detection using machine learning,” Proc. - 5th Int. conf. Intell. Comput. Control System. ICICCS 2021, pp. 967–972, 2021, doi: 10.1109/ICICCS51141.2021.9432308.

A. Kurniawan and Y. Yulianingsih, "Estimating Fraud Detection on credit cards with Machine Learning," Kila, vol. 10, no. 2, pp. 320–325, 2021, doi: 10.33322/kilat.v10i2.1482.

M. Algorithm, B. Ayunda, B. Classifier, T. Na, and B. Classifier, "Classification Of Visitor Satisfaction at The Museum Using The Naïve Bayes Algorithm," pp. 89–100, 2021.

J. Brownlee, “A Gentle Introduction to Ensemble Learning Algorithms,” https://machinelearningmastery.com/, 2021. https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ (accessed Aug. 17, 2022).

S. Samsir, J. H. P. Sitorus, Zulkifli, Z. Ritonga, F. A. Nasution, and R. Watrianthos, “Comparison of machine learning algorithms for chest X-ray image COVID-19 classification,” J Phys Conf Ser, vol. 1933, no. 1, p. 012040, 2021, doi: 10.1088/1742-6596/1933/1/012040

P. Grabowicz, N. Perello, and A. Mishra, Marrying Fairness and Explainability in Supervised Learning, vol. 1, no. 1. Association for Computing Machinery, 2022.

S. Hussein, "Ensemble learning in Machine Learning: Bagging and Boosting," https://geospasialis.com/, 2021. https://geospasialis.com/ensemble-learning/ (accessed Jun. 27, 2022).

D. Varmedja, M. Karanovic, S. Sladojevic, M. Arsenovic, and A. Anderla, “2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH): proceedings: March 20-21, 2019, Jahorina, East Sarajevo, Republic of Srpska, Bosnia and Herzegovina,” 2019 18th Int. sym. INFOTEH-JAHORINA, no. March, pp. 1–5, 2019.

JP Winkler, J. Grönberg, and A. Vogelsang, “Optimizing for recall in automatic requirements classification: An empirical study,” Proc. IEEE Int. conf Require. eng, vol. 2019-September, pp. 40–50, 2019, doi:10.109/RE.2019.00016.

MLG-ULB, “Credit Card Fraud Detection | Kaggle,” 2017. https://www.kaggle.com/mlg-ulb/creditcardfraud/data%0Ahttps://www.kaggle.com/mlg-ulb/creditcardfraud (accessed Aug. 24, 2022).

R. Firliana, R. Wulanningrum, and W. Sasongko, “Implementation of Principal Component Analysis (PCA) for Human Face Recognition,” J. Eng. , vol. 2, no. 1, pp. 65–69, 2015.

YS Aurelio, GM de Almeida, CL de Castro, and AP Braga, “Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function,” Neural Process. Lett, vol. 50, no. 2, pp. 1937–1949, 2019, doi:10.1007/s11063-018-09977-1.

J. Brownlee, “SMOTE for Imbalanced Classification with Python,” Machinelearningmastery.Com, 2020. https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/ (accessed Jun. 29, 2022).

B. Karlik, A. Mohammed, Y. Bahir, and B. Koçer, “Comprising Feature Selection and Classifier Methods with SMOTE for Prediction of Male Infertility,” Artic. int. J. Fuzzy Syst, no. November 2019, 2016, [Online]. Available: https://www.researchgate.net/publication/337307643.

A. Fernández, S. García, F. Herrera, and NV Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” J. Artif. Intell. res, vol. 61, pp. 863–905, 2018, doi:10.1613/jair.1.11192.

S. Jafari, Z. Shahbazi, YC Byun, and SJ Lee, “Lithium-Ion Battery Estimation in Online Framework Using Extreme Gradient Boosting Machine Learning Approach,” Mathematics, vol. 10, no. 6, 2022, doi:10.3390/math10060888.

N. Chakrabarty, T. Kundu, S. Dandapat, A. Sarkar, and DK Kole, Flight arrival delay prediction using gradient boosting classifier, vol. 813. Springer Singapore, 2019.

Y. Zou and C. Gao, “Extreme Learning Machine Enhanced Gradient Boosting for Credit Scoring,” Algorithms, vol. 15, no. 5, 2022, doi: 10.3390/a15050149.

F. Wang, D. Jiang, H. Wen, and H. Song, “Adaboost-based security level classification of mobile intelligent terminals,” J. Supercomput, vol. 75, no. 11, p. 7460–7478, 2019, doi: 10.1007/s11227-019-02954-y.

A. Andreyestha and A. Subekti, "Sentiment Analysis on Film Reviews Using Ensemble Learning Optimization," J. Inform, vol. 7, no. 1, pp. 15–23, 2020, doi: 10.31311/ji.v7i1.6171.

A. Shahraki, M. Abbasi, and Ø. Haugen, “Boosting algorithms for network intrusion detection: A comparative evaluation of Real AdaBoost, Gentle AdaBoost and Modest AdaBoost,” Eng. appl. Artif. Intell, vol. 94, no. February, p. 103770, 2020, doi: 10.1016/j.engappai.2020.103770.

S. Kiyohara, T. Miyata, and T. Mizoguchi, "Prediction of grain boundary structure and energy by machine learning," vol. 18, pp. 1–5, 2015, [Online]. Available: http://arxiv.org/abs/1512.03502.

DK Barupal and O. Fiehn, “Generating the blood exposome database using a comprehensive text mining and database fusion approach,” Environ. Health Perspective, vol. 127, no. 9, pp. 2825–2830, 2019, doi:10.1289/EHP4713.

J. Görtler et al, Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels, vol. 1, no. 1. Association for Computing Machinery, 2022.

Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques

Abstract

Downloads

References

Most read articles by the same author(s)