su-FRAUD DETECTION UNDER THE UNBALANCED CLASS BASED ON GRADIENT BOOSTING

of the overdue risk the on to a credit of the likelihood of a credit rating model, nature of the credit class sample score is, a large number of actual samples obtained, machine learning models to be biased towards the majority class when training. Traditional data balancing methods the bias of models to the majority category when the data is relatively unbalanced rather than excessive. boosting algorithms and CatBoost) data to detect credit fraud. To find hyperparameters and determine the accuracy of the minority class as an optimization function of the model, Bayesian optimization is used to increase the model's accuracy for the minority class. The paper was tested with real European credit card fraud data. The results were compared to traditional machine learning (decision trees and logistic regression) and the performance of the bagging algorithm (random forest). For comparison, the traditional data balancing method (Oversample) is used


Introduction
With the development of the banking field, the accelerated expansion of financial derivatives has led to increased market volatility and credit fraud [1,2]. Determining credit fraud is the assessment of an applicant's credit rating by prospecting for objective laws contained in the credit data, which is essentially a binary rating problem [3]. However, when creating a credit rating model, the emerging nature of the credit samples makes the minority class sample score very few, that is, when a large number of actual samples are obtained [4]. The sample scores with real fraudulent behavior are much lower than the nonfraudulent behavior samples. When dealing with such unbalanced data for credit fraud, misidentifying a customer with poor credit is often more costly to the organization than misrating a customer with good credit [5,6]. Therefore, optimizing the rating effect of the model on unbalanced data has become the focus of research in the field of credit fraud identification.
The research on the imbalance problem is mainly based on the resampling method. There are two types of resampling methods: undersampling and oversampling. Among them, the oversampling method is mainly represented by the artificial minority oversampling technique and the inclusion of the sample data selected to achieve class balance [7]. Although this method has been developed into a classic method for solving category imbalance problems over decades, it still has drawbacks [8].
Traditional methods can solve the problem of classifying unbalanced data to a certain extent, but there are still two main drawbacks: 1) performance evaluation tools are not perfect. Most literature still relies on overall rating accuracy, which will inevitably lead to over-focusing on majority class samples with good credit and ignoring minority class samples with poor credit; 2) the problem of severe imbalance is less researched. A portion of the data shows that unbalanced data usually does not exceed 20 % for the minority class. In contrast, in detecting true credit fraud, the proportion for the minority class is 0.02 % or less. In this highly imbalanced situation, the design and testing of the algorithm will face significant challenges.

Literature review and problem statement
According to [9], when the data is substantially uneven, imbalanced classification algorithms are unsuccessful. Current procedures result in many false alarms, which are expensive to financial institutions and might lead to inaccuracies in detection and increase the number of fraud cases. The comparison was made using a scale of accuracy and sensitivity, which are insufficient in the case of unbalanced data.
The study [10] displays a number of algorithms utilized to categorize transactions. SMOTE was employed for su-

Credit fraud modeling is an important topic covered by researchers. Overdue risk management is a critical business link in providing credit loan services. It directly impacts the rate of return and the bad debt percentage of lending organizations in this sector. Credit financial services have benefited the general public as a result of the development of the mobile
Internet, and overdue risk control has evolved from the manual judgment that relied on rules in the past to a credit model built using a large amount of customer data to predict the likelihood of customers becoming delinquent. When creating a credit rating model, the emerging nature of the credit samples makes the minority class sample score very few; that is, when a large number of actual samples are obtained, this causes machine learning models to be biased towards the majority class when training. Traditional data balancing methods can reduce the bias of models to the majority category when the data is relatively unbalanced rather than excessive.

Gradient boosting algorithms (XGBoost and CatBoost) are proposed in this paper to model highly unbalanced data to detect credit fraud. To find hyperparameters and determine the accuracy of the minority class as an optimization function of the model, Bayesian optimization is used to increase the model's accuracy for the minority class. The paper was tested with real European credit card fraud data. The results were compared to traditional machine learning (decision trees and logistic regression) and the performance of the bagging algorithm (random forest). For comparison, the traditional data balancing method (Oversample) is used
Keywords: machine learning, credit fraud modeling, unbalanced data, gradient boosting algorithms have been provided using machine learning algorithms. Our goal is to find an effective solution that addresses class imbalance issues based on various criteria such as precision, recall, and F1 score.

The aim and objectives of the study
This study aims to model unbalanced credit fraud data based on gradient boosting.
To achieve the aim, the following objectives are accomplished: -to model highly unbalanced data of credit fraud, gradident boosting algorithms (XGBoost and CatBoost) are used and сomparison with traditional algorithms is carried out; -to find hyperparameters and determine the accuracy of the minority class as an optimization function, Bayesian optimization is used to increase the accuracy of the model for the minority class; -for comparison, traditional data balancing methods (Oversample) are applied; -to compare the proposed method with previous work that used the same data set.

Materials and methods
This paper proposes a highly unbalanced credit fraud algorithm based on gradient reinforcement. At the same time, to increase the model's accuracy for the minority class, Bayesian optimization is used to find hyperparameters and determine the accuracy of the minority class as an optimization function of the model. Finally, the paper was tested with real European credit card fraud data, comparing the performance with traditional machine learning and classic imbalance algorithms.
This section deals with the research methodology for the procedures involved during the experiment. This proposed methodology includes the description of the data set, the division of data into training and testing, and classification methods such as logistic regression, Decision Tree, and gradient boosting models (XGboost, CatBoost) for forecasting. The Oversample method is used to distribute the data into equal categories. The performance evaluation of the algorithms is carried out based on accuracy, precision, recall, and F1 score. The steps involved in credit card fraud detection are represented in a flowchart below in Fig. 1.
perfluous specimens since the data set is unique. Data set research was performed using CCF. The facility was also selected, and the data set was split into two: testing and training. The algorithms applied random forest, fabric foundation, and multilayer perceptron. The results suggest that anything can be used to identify CCF. The suggested model can detect other circumstances. CCF refers to the loss of CC data. Many algorithms can learn machines. The study concluded that random forest is the best classifier and gives different results if random_state is not specified.
In [11], a hierarchical grouping technique was presented and used to solve the class imbalance in fraud detection. A clustering tree is built in two steps: first, the clustering method is defined while considering the potential of class separation; and second, the clustering algorithm is utilized to produce a tree that can assess if an incoming transaction is valid. They use the random undersampling method to deal with the class imbalance problem, which is incompatible with the excessive imbalance.
In [12], a three-step strategy for predicting credit fraud is proposed. PCA is applied to extract the relevant features and minimize dimensionality in the data as a first step. Second, a blend of k-mean clusters and hyper-SMOTE is used for imbalanced resampling data. The employment of the Tomek Link method to eliminate noisy data is the third step. Four alternative classification techniques were utilized on the generated dataset: logistic regression, decision tree classifier, k-nearest neighbors, and neural networks with 5-fold cross-validation. The neural networks incorporating FusedRCE had the greatest prediction rate, according to the research. The proposed method includes multiple stages in addition to data redistribution.
The authors of [13] present three methods for dealing with unbalanced datasets: resampling methods (undersampling and oversampling), cost-sensitive training, and tree algorithms (decision tree, random forest, and Naive Bayes), emphasizing why the Receiver Operating Characteristics curve (ROC) should not be used to measure the performance of the algorithm on these types of datasets. To examine the performance metrics of the three approaches indicated above, the experimental test was conducted on a total of 890.977 banking transactions. The study concluded that random forests with oversample are the best classifier and did not address the hyperparameters of the algorithm, specifically random_state.
In [14], a comparative study of various methods of treating class imbalances was carried out. They evaluated group classification models including AdaBoost, XGBoost, and Random Forest to assess the efficacy and efficiency of various stratification methodologies paired with recent classification approaches. They concluded that redistribution strategies are ineffective. The researchers concluded that algorithms that adopt the principle of collective learning are better, and did not address hyperparameters and the way to find them.
In [15][16][17], the GA was used for feature selection and aggregation in an intelligent payment card fraud detection system. To test the efficacy of their suggested strategy, the authors used a variety of machine learning methods. Data sets without excessive imbalance were used.
Various challenges were described in previous literature reviews, but class imbalance was the biggest problem for the data set. Category imbalance is a problem in which the proportion of true transactions is greater than that of fraud transactions. Many researchers have already worked on the unbalanced problem, and many solutions The dataset consists of 31 features and 284,807 transactions collected from European cardholders, of which 492 are fraudulent [18][19][20][21]. Since the fraudulent transactions are 0.173 percent, the data set is unbalanced and suffers from severe skew. Fig. 2 shows the great disparity between the majority and minority classes.
Logistic regression is a kind of machine learning model widely used in classification. Based on the most probable estimation method of statistical theory, logistic regression can quickly and effectively solve linear classification. Training the logistic regression model, we classify the model's output into two types, 0 means that the loan is in default, and 1 means it has fulfilled its obligations without default. Logistic regression uses the Sigmoid function to converge the result [16]. Binary logistic regression is represented by: The probability of belonging to the (1) class β n is derived from maximum likelihood estimation [17].
Decision trees are the simplest and most widely used machine learning structures. They are used for classification and regression. Trees consist of a root at the top of the tree, unlike trees in nature, dividing data into several sectors, similar to branches in trees, and leaves that represent the final decision of each branch. The tree is built from the root at the top to the bottom. It is chosen based on the attribute that best holds the classification and becomes the root. This process is repeated for the rest of the branches, meaning that it is built based on the most important classification to the least important or the effect of the classification [22,23]. The following three measures are used: where c is the number of classes and p(i|t) indicates the probability of records belonging to that class.
Generally, decision tree algorithms are considered somewhat extinct because of the emergence of advanced types, such as boosted trees in random forests.
XGBoost is eXtreme Gradient Boosting. It was designed by Chen Tianqi, which is also an improved algorithm for gradient boosting [24]. It improves the loss function. On the one hand, the original loss function is replaced from the first-order Taylor expansion to the second-order Taylor expansion. The loss function is expanded to the second order.
On the other hand, the regularization term is introduced into the loss function. The idea of this algorithm is to add a tree and then perform feature splitting to grow the tree. Each time a tree is added, a new function is learned, and then the residual error of the last prediction is fitted. Finally, according to the tree's structure, the optimal score under this structure can be obtained, and the total score can be calculated through the leaf nodes of each tree [25,26].
CatBoost algorithm (Categorical Boosting) is an improved algorithm based on the gradient boosting decision tree (GBDT) framework [27]. It can handle various types of data and is easy to adjust parameters. It provides a more accurate and better calculation result than the XGBoost algorithm. The original purpose of CatBoost is to improve the classification features of GBDT because the previous processing method replaces the corresponding classification features with the label mean [28][29][30]. Causes a conditional offset problem. CatBoost improves the statistics and introduces the prior distribution item and its corresponding weight on the original basis, reducing the impact of variables with fewer categories in the classification variables on the data; secondly, it can effectively reduce noise [24]. Another improvement of CatBoost is improving the traditional gradient estimation method to an ordered boosting method, which will obtain an unbiased gradient estimation, reduce the gradient estimation error, reduce the over-fitting problem, and finally improve the model generalization [31,32].
Bayesian optimization algorithm makes full use of the previous information. The Bayesian optimization algorithm learns the shape of the objective function and finds the parameters that promote the objective function to the global optimal value. Specifically, it learns the shape of the objective function by first assuming a search function based on the prior distribution; then, every time a new sampling point is used to test the objective function, it uses this information to update the prior distribution of the objective function. Finally, the algorithm tests the point where the global maximum value is most likely to appear given by the posterior distribution. For the Bayesian optimization algorithm, there is a point to pay attention to. Once a local optimal value is found, it will continuously sample in the area, so it is easy to fall into the local optimal value. To make up for this shortcoming, the Bayesian optimization algorithm will find a balance between exploration and utilization. Exploration is to obtain sampling points in areas that have not yet been sampled; while utilization is based on the posterior distribution in the most likely. Sampling is performed on the region with the global maximum value [33,34].

Fig. 2. Comparison between the majority and minority classes
It has been suggested to use the Bayes' theorem to find optimal parameters as it can find a wider space for these parameters and test the efficiency of this method.

2. Performance evaluation
The overall accuracy scale is used to check the performance of machine learning models. In the case of excessively unbalanced data, it is inappropriate to identify the model's performance using the accuracy scale because it will show a high degree of accuracy even if the model is completely biased. This paper uses a scale (Recall, Precision, F1 score) for both classes to determine the model's performance for the minority and majority classes.
PRECISION is a measuring tool that determines the model's performance in classifying a particular class concerning the total cases. It can be expressed as the following equation:
RECALL is a measure to find the number of positive cases, which is positive relative to the total of a particular class in terms of whether it was classified correctly. It can be expressed as the following equation: Recall .
F1 is a comprehensive measure of accuracy that combines precision and recall, and in this way, combining addition and multiplication are only two components to make a completely different result, mathematically expressed by the following equation: It is a helpful hybrid scale for unbalanced classes.

5.
Results of performing machine learning models to predict fraud risk

1. Results of using gradient boosting algorithms compared to traditional algorithms
The first step is an initial evaluation of each machine learning algorithm's performance. These algorithms are logistic regression, Decision Tree, and gradient boosting models (XGboost, CatBoost). A comparison of the performance of models for minority classes is presented in Table 1.  Table 1 shows that CatBoost and XGBoost achieve the highest values for Recall and F1 score. CatBoost outperforms XGBoost in terms of precision, achieving 95 %. At the same time, random forests performed well with a slight difference from the gradient enhancement algorithms, as they scored 0.94, 0.82, and 0.87 on the Precision, Recall, and F1 score scale, respectively.

2. Results of using Bayesian optimization to find hyperparameters
To reduce the bias of the models, Bayesian optimization was used to find the hyperparameters. Table 2 compares the use of Bayesian optimization on the CatBoost algorithm.  Table 2 shows that the model performance improved using Precision, while the model was not affected by Recall and F1 score.

3. Algorithms comparison using oversample
The second stage assesses how well each machine learning algorithm performs when the Oversample technique is applied to the data set and the same algorithms are used. Table 3 compares the performance of minority class models on the data set after the Oversample.  As shown in Table 3, the best model was XGBoost for the minority category, which scored 0.96 and 0.89 for Precision and F1 score, respectively. The logistic regression was excessively biased to the majority class, where the precision was 0.04, and at the same time achieving the best performance according to the Recall scale, which was 0.90 because the Recall scale was based on TP for FN.

4. Comparison with previous works
After analyzing the results, the CatBoost model was the best in the case of excessively unbalanced data. To determine the effectiveness of this method, it is compared to a model from previous works that modeled the same data set using the same scales as in Table 4. Compared to previous works, it can be concluded that CatBoost was the best in unbalanced modeling data compared to the selected models and previous studies.

Discussion of the results of performing machine learning models to predict fraud risk
Machine learning models based on gradient boosting learning, specifically the CatBoost algorithm, and hyperparameter optimization based on the prediction accuracy of the minority class can tackle the imbalance problem and build an unbiased model. As it appears from the results, the proposed method achieved 97 % on the Precision scale, at the same time 83 % on the Recall scale and 89 % on the F1 score scale for the minority class, which is the highest compared to the rest of the methods as shown in Table 1.
Building gradient models based on predicting the models' errors that precede them will build models biased to the error in the previous model (false prediction in the minority category), which solves the problem of imbalance. Compared to traditional data balancing methods, gradient boosting learning models do not need to redistribute the data as a preprocessing stage. When applying the data balancing methods (Oversample), we find that the model performance has deteriorated at various scales and for most models, as in Table 3, because the data is highly unbalanced.
The limitations of this study lie in using a single data set and a single optimization method.
To prove the effectiveness of these algorithms, future work can use more than one data set with varying levels of imbalance and in more than one field.
Special algorithms for unbalanced data can be developed based on finding the minority class, as in the one-class classification algorithms.

Conclusions
1. Using gradient boosting learning is a promising solution to the problem of data imbalance, as the results showed the superiority of the CatBoost model compared to other models that have been applied. The CatBoost algorithm achieved 95 % precision for the minority class.
2. Determining hyperparameters using Bayesian optimization and using the model's accuracy for the minority class to determine the parameters improve the model's performance. The CatBoost+BO algorithm achieved 97 % precision for the minority class.
3. The application of traditional data balancing methods (Oversample) is not feasible in the case of excessive asymmetry, as in the case of credit fraud. When applied to the CatBoost algorithm, the model's performance deteriorated to 82 % precision for the minority class.
4. Compared to previous work that used the same data set, the proposed method shows better performance by using different scales that measure the model's accuracy for the minority class.