Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques

Financial institutions in the form of banks provide facilities in the form of credit cards, but with the development of technology, fraud on credit card transactions is still common, so a system is needed that can detect fraud transactions quickly and accurately. Therefore, this study aims to classify fraudulent transactions. The proposed method is Ensemble Learning which will be tested using the Boosting type with 3 variations, namely XGBoost, Gradient Boosting, and AdaBoost. Then, to maximize the performance of the model, the dataset used is optimized with the Synthetic Minority Oversampling Technique (SMOTE) function from the Imblearn library in the data train to handle imbalanced dataset conditions. The dataset used in this study is entitled "Credit Card Fraud Detection" with a total of 284807 data which is divided into two classes: Not Fraud and Fraud. The proposed model received a recall of 92% with Gradient Boosting, where the results increased by 10.37% compared to the previous study using Random Forest with a recall result of 81.63%. This is because the use of SMOTE in the data train greatly influences the classification of Not fraud and fraud classes.


Introduction
Banking is a system of economic scope that serves the public in the field of financial services.Most of the population in parts of the world use financial services in the form of banks, one of the banking efforts in determining the services needed by the community with the convenience of a facility called a credit card.However, it is very unfortunate that currently there are still many misuses of credit cards which are commonly referred to as carding or card fraud [1].
In the European Central Bank website publication, the total value of fraudulent transactions using credit cards published in the Single Euro Payments Area (SEPA) and obtained worldwide amounted to 1.87 billion euros in 2019 [3].This is a fairly serious problem because credit card fraud losses are quite large.Therefore, to make transactions successful in the world of digital payments, credit card fraud detection must be considered [2].
To ensure fraud in credit card transactions, a classification process using machine learning technology is needed [2].Machine learning aims to train machines to handle large amounts of data more efficiently, which is expected to be developed to make it easier to determine, search, and share data.It begins by dividing the data into two main parts, namely training data and test data [3].Machine learning technology can be used to perform classification and detection with a fairly good success rate [4].
In machine learning, the type used is supervised learning or machine learning with labels that can later predict complete data patterns.In supervised learning, using any potential feature that increases the predictiveness of the model to meet certain requirements [5].When there is new data in the extraction process, the features in the new data will be matched with the model pattern obtained from the data label [4].Each label will be compared with the classification of data labels which will produce output in the form of classification results.In practice, one approach that is superior in machine learning is ensemble learning [4].
The three types of ensemble learning include Bagging, Stacking, and Boosting.To dig deeper into bagging means to combine many Decision Trees on different samples from the same dataset, with the result being a predictive average.Stacking involves many types of models from the same data as well as using other models to learn the best combination of predictions.Boosting adds consecutive ensemble members, corrects the predictions made by the previous model and produces an average weight of the predictions [6].
Boosting is learning the sequence adaptively, namely the basic model that depends on the previous model and follows a deterministic strategy [4].In this study, three boosts were used, including Gradient Boosting, AdaBoost, and XGBoost.
In this study, the dataset used is the same as the previous study [7], but previous studies used three different methods, namely Naive Bayes, LR, and Random Forest.The best method in this study is Random Forest with 99.96% accuracy, 96.38% precision, and 81.63% recall [7].
From the results of previous studies [7], the resulting accuracy is very good, but the recall value of 81.63% still needs to be improved again.Because the recall value is needed to detect fake transactions, where recall means counting the number of true fake transactions, which are caught in the model in positive labels.This is the main topic in the discussion of credit card fraud detection research.
Based on the description above, the purpose of this study is to increase the recall value from previous studies using three boosting scenarios and to try to determine the effectiveness with and without using SMOTE in the effect of the overall recall value.With the hope that the system created can provide benefits in accurately detecting fake credit cards and reducing the frequency of credit card fraud.Recall is considered more important for automating engineering task requirements than precision.Recall is the percentage of correct answers based on all possible correct answers [8].

Research Stages
The research method carried out has several stages as shown in Figure 1.The first stage is entering data which is then preprocessed by removing unused attributes.The next stage is data splitting, dividing the dataset into 80% train data and 20% test data.The third stage is divided into 2 scenarios, namely resampling using SMOTE so that both fraud and non-fraud classes are balanced.And without using SMOTE.Then proceed to the modeling section.The last stage is the evaluation of the results by paying attention to the confusion matrix and classification report.

Dataset
The data used in this study is the same dataset from previous research [7] which is in the form of tabular data on European Cardholders' credit card transactions which can be accessed via Kaggle.The total amount of data is 284807 data.In this dataset, the amount of data between fraud and not-fraud classes is too unbalanced, where 0.173% of the total transactions are fake [7] with a total of 492 fake transactions and 244315 legal transactions.Then, in this study the data set will be divided into 80:20, where 80% is train data and 20% is test data.An example of the top five datasets can be seen in Table 1.

Table 1. Example dataset
This dataset only contains numeric input variables that have the features "Time, V1, V2, V3, up to V28, Amount, and Class".Values V1 to V28 are using credit card transaction data at certain bank agencies, a statement to the dataset provider on the Kaggle website published by the machine learning group -ULB [9], that the provider does not provide original features and background information about the dataset.This is intended to maintain the confidentiality of credit card users.The original confidential dataset is then transformed using the PCA (Principal Component Analysis) method, which reduces the dimensions of the data without reducing the characteristics of the data [10].Then leave the feature "time, amount, class" which is not transformed by PCA.Visualization of data imbalance between fake transaction data and genuine transactions can be seen in Figure 2. The problem with the dataset used is that the dataset is unbalanced because there are too few minority classes.This problem is known as the imbalance data set problem [11].One approach that is widely used in synthesizing new samples in the model is the Synthetic Minority Oversampling Technique (SMOTE).SMOTE is the development of the oversampling method.The way this works is by selecting the closest sample in the feature space, then drawing a line between the examples in the feature space and drawing a new sample along that line without changing the essence of the whole dataset [12].
This aims to improve the performance of the model used.Research related to the application of SMOTE to classification using Multi-Level Perceptron (MLP), k-Nearest Neighbor (K-NN), and other methods has succeeded in improving classification performance with the best results using the Naive Bayes method which has been SMOTE at 90.7% [13] .
Instantiation class minority   selected as the basis for creating new synthetic data points.Based on the distance metric, several nearest neighbors of the same class (points   up to  4 ) are selected from the training set.Finally, interpolate random conducted for get example new  1 until  4 [14].Image for illustration SMOTE algorithm can seen in Figure 3. Taken from reference sources [14].First, the minority class instances are randomly selected from the training set.Next, K nearest neighbors (5 by default) are obtained.Finally, this N of K instances are randomly selected to enumerate new instances.

Architecture Algorithm
XGBoost is an ensemble classification that uses gradient enhancement, whose model structure leads to a loss function which is further expanded by adding an expansion function [15].By reviewing the dataset used in XGBoost m is symbolized for the features in the dataset, and N is the number of data in the dataset used with equation 1.
On variable is the sample train data, and  the features of each sample. show circumstances payload in sample .Tree boosting made one group in equation 2.
Following is the formula for predict score from the tree boosting model with outputs   and  as function equation .With followed score on trees in equation 3 .
which is in equation 4. Gradient Boosting is method that can used for develop classification and regression models for optimize the model learning process, which partly big characteristic non-linear and more known as tree decision or regression [16]. .

𝛺(𝐷
( Algorithm gradient boost realize equation 5 with method integration additives: where  is the number of iterations.where   is the first derivative of the loss function which can be calculated as: Therefore, equation 8 can be turned into an optimization problem: It can be seen from equation 9 that the target fitting of   is the negative gradient of the loss function.Therefore, before training each tree in the Gradient Boosting Decision Tree, the target training is updated in each tree in equation 10 [17].
Adaboost is defined as an iterative algorithm that uses different classifiers for the same training set, then combines them to create the strongest classifier at the end [18].The Adaboost algorithm is specially made to handle classification problems which serve to improve the accuracy of weak learners [19].Adaboost has several advantages, which are simple and easy to implement [20].
Then next initialize weight  The sample so that  it is evenly distributed represents the sample weight (, ) specified in the iteration  with the equation 11.
For example  is the number of iterations for each  = 1... , based on the sample distribution , resulting in a sample to form a set .practice classifying ℎ on sets .Use classifiers ℎ to classify all sample set .
Classification is carried out in this round and the minimum error rate is E j in equation 12.
count weight with equation 13.
then for look for score  calculated in equation 14.
then update weight with equation 15.

Scenario Test
Referring to research [7].This study uses the same feature selection as previous studies.Where is the dropped class "time, amount, class, v28".
In this study, classification was carried out into two classes, namely fraudulent transactions, and non-fraud transactions.Then the dataset is split with a ratio of 80% for train data and 20% for test data.Furthermore, the data train is balanced using SMOTE.Meanwhile, the test data is left as the actual data.The amount summarized in Table 2 is the total dataset.Furthermore, each algorithm is tested in 2 scenarios with 3 different algorithm tests.Namely, XGBoost, Gradient Boosting, and AdaBoost are described in Table 4.

Results and Discussion
The results of this study were carried out based on the methodological arrangement described in the research Next, the dataset is downloaded and saved into Google Drive using an API integrated with Kaggle via Google Colab.Google Colab was chosen because it has collaboration features so researchers can easily share projects.Before processing, the dataset is split into a ratio of 80% train data and 20% test data which is continued using the SMOTE imblearn library [21], to balance the amount of data in each class.
The Fraud class has less data than the Not Fraud class so that the amount of data in the Fraud class will be added as much as the difference between the number of data between Fraud and Not Fraud classes in the training data, which is 283,823 data.data visualization after SMOTE is performed in Figure 4.Each pre-defined algorithm, training dataset that has been preprocessed is carried out by the default iterative fit process in the sklearn library [22].The next stage is analysis and test results, analysis can be done in several ways, one of which is by using a confusion matrix.
The confusion matrix is a tabular layout that compares the predicted class labels with the actual class labels in all data instances [23].Based on Figure 11, the confusion matrix table for the XGBoost scenario using SMOTE can be understood that in the Fraud class there are 91 image data that are predicted correctly and 9 data that are predicted incorrectly.As well as in the Not Fraud class there are 56264 correctly predicted data and 598 incorrectly predicted data by the model.Then continued Figure 13.
In Figure 12, the confusion matrix table for the Gradient Boosting scenario using SMOTE can be seen in the Fraud class, there are 92 image data that are correctly predicted and 8 image data that are predicted incorrectly.As well as in the Not Fraud class there are 56264 correctly predicted data and 673 incorrectly predicted data by the model.The three models are able to detect Fraud class well.The highest model of the three models is the Gradient Boosting model.
Each trial has a different recall value.Based on Table 5, the recall value in the three scenarios with SMOTE has a value above 90% except for Adaboost with SMOTE in the fraud class producing a value of 89%.
In the third scenario without SMOTE, the results obtained in the three algorithms are seen from the recall value for the not fraud class obtained a value of 100% but for the Fraud class it has not produced a value above the value of 90%.Then, to continue this research, it is recommended to use deep learning.This is intended to find out whether the same dataset when using deep learning can produce better accuracy, precision, and recall values.

Figure 2 .
Figure 2. Graphics amount of data each class 2.3 Synthetic Minority Oversampling Technique (SMOTE)

Figure 3 .
Figure 3. Illustration SMOTE algorithm On the equation,  defines the complexity of the model.∝as the controlling parameter and the leaf number of .is the magnitude of the leaf weight.During each round of model training data   , the XGBoost algorithm adds a new function to the model, keeping the final prediction result unchanged [15].

Figure 4 .
Figure 4. Number of classes that have been balanced 3. 1 Analysis and Test Results

Table 2 .
Total Data Before Conducted Oversampling

Table 3
is the number of SMOTE train data and test data.

Table 3 .
Total Data After Done SMOTE

Table 5 .
Test Results of Each Scenario Comparison of the Best Model Performance with Previous ResearchAfter a series of test scenarios have been carried out, the next process is to compare the performance of the best model with the results obtained in previous studies.Based on Table6, the classification report in the XGBoost scenario obtained 99% accuracy and 100% precision in the Not Fraud class and 93% in the Fraud class, where the data increased overall after using without SMOTE.

Table 6 .
The proposed classification report model According to Table7, this study produced a Gradient Boosting model using SMOTE as the best model, where this model can exceed the recall results of the model built in previous research by 10.37%.

Table 7 .
Recall Results of Each ScenarioBased on the research that has been done.SMOTE greatly affects the overall recall results.In previous research, Recall from the Random Forest model assisted by SMOTE obtained a yield of 81.63%.Therefore, this research succeeded in increasing the recall value using the best model, namely Gradient Boosting with SMOTE by 92%.The other 2 models, namely XGBoost and AdaBoost, produced a recall value of 91% for both, only 1% difference from the best model previously described.