Fraud detection under the unbalanced class based on gradient boosting
DOI:
https://doi.org/10.15587/1729-4061.2022.254922Keywords:
machine learning, credit fraud modeling, unbalanced data, gradient boosting algorithmsAbstract
Credit fraud modeling is an important topic covered by researchers. Overdue risk management is a critical business link in providing credit loan services. It directly impacts the rate of return and the bad debt percentage of lending organizations in this sector. Credit financial services have benefited the general public as a result of the development of the mobile Internet, and overdue risk control has evolved from the manual judgment that relied on rules in the past to a credit model built using a large amount of customer data to predict the likelihood of customers becoming delinquent. When creating a credit rating model, the emerging nature of the credit samples makes the minority class sample score very few; that is, when a large number of actual samples are obtained, this causes machine learning models to be biased towards the majority class when training. Traditional data balancing methods can reduce the bias of models to the majority category when the data is relatively unbalanced rather than excessive. Gradient boosting algorithms (XGBoost and CatBoost) are proposed in this paper to model highly unbalanced data to detect credit fraud. To find hyperparameters and determine the accuracy of the minority class as an optimization function of the model, Bayesian optimization is used to increase the model's accuracy for the minority class. The paper was tested with real European credit card fraud data. The results were compared to traditional machine learning (decision trees and logistic regression) and the performance of the bagging algorithm (random forest). For comparison, the traditional data balancing method (Oversample) is used
References
- McNulty, D., Milne, A. (2021). Bigger Fish to Fry: FinTech and the Digital Transformation of Financial Services. Disruptive Technology in Banking and Finance, 263–281. doi: https://doi.org/10.1007/978-3-030-81835-7_10
- Breidbach, C. F., Keating, B. W., Lim, C. (2019). Fintech: research directions to explore the digital transformation of financial service systems. Journal of Service Theory and Practice, 30 (1), 79–102. doi: https://doi.org/10.1108/jstp-08-2018-0185
- Aggarwal, N. (2021). The norms of algorithmic credit scoring. The Cambridge Law Journal, 80 (1), 42–73. doi: https://doi.org/10.1017/s0008197321000015
- Alfaiz, N. S., Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11 (4), 662. doi: https://doi.org/10.3390/electronics11040662
- Vaidhya, A., Muruganandam, S., Rajendran, S. (2020). Dealing with Class Imbalances for Detection of Fraudulent Credit Card Transactions. International Journal of Advanced Science and Technology, 29, 7960–7967. Available at: https://www.researchgate.net/publication/343712209_Dealing_with_Class_Imbalances_for_Detection_of_Fraudulent_Credit_Card_Transactions
- Marella, S. T., Karthikeya, K., Myla, S., Sai, M. M., Allam, V. (2019). Detecting fraudulent credit card transactions using outlier detection. International Journal of Scientific & Technology Research, 8 (10), 630–637. Available at: https://www.ijstr.org/final-print/oct2019/Detecting-Fraudulent-Credit-Card-Transactions-Using-Outlier-Detection.pdf
- Fujiwara, K., Huang, Y., Hori, K., Nishioji, K., Kobayashi, M., Kamaguchi, M., Kano, M. (2020). Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis. Frontiers in Public Health, 8. doi: https://doi.org/10.3389/fpubh.2020.00178
- Durga Prasad, D., Prasad, D. V., Rao, K. N. (2019). Imbalanced Data Using with-in Class Majority Under Sampling Approach. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). doi: https://doi.org/10.1109/icecct.2019.8869339
- Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., Zeineddine, H. (2019). An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access, 7, 93010–93022. doi: https://doi.org/10.1109/access.2019.2927266
- Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., Anderla, A. (2019). Credit Card Fraud Detection - Machine Learning methods. 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH). doi: https://doi.org/10.1109/infoteh.2019.8717766
- Zhang, Y., Liu, G., Zheng, L., Yan, C. (2019). A Hierarchical Clustering Strategy of Processing Class Imbalance and Its Application in Fraud Detection. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). doi: https://doi.org/10.1109/hpcc/smartcity/dss.2019.00249
- Kotekani, S. S., Velchamy, I. (2021). An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection. Journal of Computing and Information Technology, 28 (4), 269–285. doi: https://doi.org/10.20532/cit.2020.1005216
- Mînăstireanu, E.-A., Meșniță, G. (2020). Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection. Brain. Broad research in artificial intelligence and neuroscience, 11 (1), 131–143. doi: https://doi.org/10.18662/brain/11.1/19
- Singh, A., Ranjan, R. K., Tiwari, A. (2021). Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms. Journal of Experimental & Theoretical Artificial Intelligence, 1–28. doi: https://doi.org/10.1080/0952813x.2021.1907795
- Seera, M., Lim, C. P., Kumar, A., Dhamotharan, L., Tan, K. H. (2021). An intelligent payment card fraud detection system. Annals of Operations Research. doi: https://doi.org/10.1007/s10479-021-04149-2
- Johnson, A. A., Ott, M. Q., Dogucu, M. (2022). Logistic Regression. Bayes Rules!, 329–354. doi: https://doi.org/10.1201/9780429288340-13
- Singh, B., Mahrishi, M. (2020). Comparing Different Models for Credit Card Fraud Detection. SKIT Research Journal, 10 (2), 8. doi: https://doi.org/10.47904/ijskit.10.2.2020.8-12
- Kumar, P. S., K, A. K., Mohapatra, S., Naik, B., Nayak, J., Mishra, M. (2021). CatBoost Ensemble Approach for Diabetes Risk Prediction at Early Stages. 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON). doi: https://doi.org/10.1109/odicon50556.2021.9428943
- Credit Card Fraud Detection. Available at: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- Taha, A. A., Malebary, S. J. (2020). An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine. IEEE Access, 8, 25579–25587. doi: https://doi.org/10.1109/access.2020.2971354
- Vengatesan, K., Kumar, A., Yuvraj, S., Ambeth Kumar, V. D., Sabnis, S. S. (2020). Credit card fraud detection using data analytics techniques. Advances in Mathematics: Scientific Journal, 9 (3), 1177–1188. doi: https://doi.org/10.37418/amsj.9.3.43
- Weisburd, D., Wilson, D. B., Wooditch, A., Britt, C. (2021). Logistic Regression. Advanced Statistics in Criminology and Criminal Justice, 127–185. doi: https://doi.org/10.1007/978-3-030-67738-1_4
- Panda, R. M., Daya Sagar, B. S. (2021). Decision Tree. Encyclopedia of Earth Sciences Series, 1–6. doi: https://doi.org/10.1007/978-3-030-26050-7_81-1
- Alsaleem, M., Hasoon, S. (2020). Predicting Bank Loan Risks Using Machine Learning Algorithms. AL-Rafidain Journal of Computer Sciences and Mathematics, 14 (1), 159–168. doi: https://doi.org/10.33899/csmj.2020.164686
- Sankar, S., Potti, A., Chandrika, G. N., Ramasubbareddy, S. (2022). Thyroid Disease Prediction Using XGBoost Algorithms. Journal of Mobile Multimedia. doi: https://doi.org/10.13052/jmm1550-4646.18322
- Abdulghani, A. Q., UCAN, O. N., Alheeti, K. M. A. (2021). Credit Card Fraud Detection Using XGBoost Algorithm. 2021 14th International Conference on Developments in eSystems Engineering (DeSE). doi: https://doi.org/10.1109/dese54285.2021.9719580
- Omogbhemhe, M. I., Momodu, I. B. A. (2021). Model for Predicting Bank Loan Default using XGBoost. International Journal of Computer Applications, 183 (32), 1–4. doi: https://doi.org/10.5120/ijca2021921705
- Jumabek, A., Yang, S., Noh, Y. (2021). CatBoost-Based Network Intrusion Detection on Imbalanced CIC-IDS-2018 Dataset. The Journal of Korean Institute of Communications and Information Sciences, 46 (12), 2191–2197. doi: https://doi.org/10.7840/kics.2021.46.12.2191
- Pujara, A., Pattabiraman, V., Parvathi, R. (2022). Food Demand Forecast for Online Food Delivery Service Using CatBoost Model. 3rd EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, 129–142. doi: https://doi.org/10.1007/978-3-030-78750-9_9
- Bhati, N. S., Khari, M. (2021). A New Intrusion Detection Scheme Using CatBoost Classifier. Forthcoming Networks and Sustainability in the IoT Era, 169–176. doi: https://doi.org/10.1007/978-3-030-69431-9_13
- Qi, J., Yang, R., Wang, P. (2021). Application of explainable machine learning based on Catboost in credit scoring. Journal of Physics: Conference Series, 1955 (1), 012039. doi: https://doi.org/10.1088/1742-6596/1955/1/012039
- Abdullahi, A. I., Raheem, L., Muhammed, M., Rabiat, O., Ganiyu, A. (2020). Comparison of the CatBoost Classifier with other Machine Learning Methods. International Journal of Advanced Computer Science and Applications, 11 (11). doi: https://doi.org/10.14569/ijacsa.2020.0111190
- Hema, A. (2020). Machine Learning methods for Discovering Credit Card Fraud. International Research Journal of Computer Science, 8 (1), 1–6. Available at: https://www.researchgate.net/publication/350720972_MACHINE_LEARNING_METHODS_FOR_DISCOVERING_CREDIT_CARD_FRAUD
- Agrawal, T. (2021). Bayesian Optimization. Hyperparameter Optimization in Machine Learning, 81–108. doi: https://doi.org/10.1007/978-1-4842-6579-6_4
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Raya Alothman, Hassanein Ali Talib, Mazin S. Mohammed
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.