Fraud detection under the unbalanced class based on gradient boosting

Raya Alothman; Hassanein Ali Talib; Mazin S. Mohammed

doi:10.15587/1729-4061.2022.254922

Authors

Raya Alothman University of Mosul, Iraq https://orcid.org/0000-0002-8959-3353
Hassanein Ali Talib University of Mosul, Iraq https://orcid.org/0000-0001-5276-9258
Mazin S. Mohammed University of Mosul, Iraq https://orcid.org/0000-0003-1744-1219

DOI:

https://doi.org/10.15587/1729-4061.2022.254922

Keywords:

machine learning, credit fraud modeling, unbalanced data, gradient boosting algorithms

Abstract

Credit fraud modeling is an important topic covered by researchers. Overdue risk management is a critical business link in providing credit loan services. It directly impacts the rate of return and the bad debt percentage of lending organizations in this sector. Credit financial services have benefited the general public as a result of the development of the mobile Internet, and overdue risk control has evolved from the manual judgment that relied on rules in the past to a credit model built using a large amount of customer data to predict the likelihood of customers becoming delinquent. When creating a credit rating model, the emerging nature of the credit samples makes the minority class sample score very few; that is, when a large number of actual samples are obtained, this causes machine learning models to be biased towards the majority class when training. Traditional data balancing methods can reduce the bias of models to the majority category when the data is relatively unbalanced rather than excessive. Gradient boosting algorithms (XGBoost and CatBoost) are proposed in this paper to model highly unbalanced data to detect credit fraud. To find hyperparameters and determine the accuracy of the minority class as an optimization function of the model, Bayesian optimization is used to increase the model's accuracy for the minority class. The paper was tested with real European credit card fraud data. The results were compared to traditional machine learning (decision trees and logistic regression) and the performance of the bagging algorithm (random forest). For comparison, the traditional data balancing method (Oversample) is used

Author Biographies

Raya Alothman, University of Mosul

Lecturer, Faculty Member

Department of Computer Science

College of Pure Sciences for Education

Hassanein Ali Talib, University of Mosul

Assistant Teacher, Faculty Member

Department of Computer Science

College of Pure Sciences for Education

Mazin S. Mohammed, University of Mosul

Assistant Teacher, Faculty Member

Department of Postgraduate

References

McNulty, D., Milne, A. (2021). Bigger Fish to Fry: FinTech and the Digital Transformation of Financial Services. Disruptive Technology in Banking and Finance, 263–281. doi: https://doi.org/10.1007/978-3-030-81835-7_10
Breidbach, C. F., Keating, B. W., Lim, C. (2019). Fintech: research directions to explore the digital transformation of financial service systems. Journal of Service Theory and Practice, 30 (1), 79–102. doi: https://doi.org/10.1108/jstp-08-2018-0185
Aggarwal, N. (2021). The norms of algorithmic credit scoring. The Cambridge Law Journal, 80 (1), 42–73. doi: https://doi.org/10.1017/s0008197321000015
Alfaiz, N. S., Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11 (4), 662. doi: https://doi.org/10.3390/electronics11040662
Vaidhya, A., Muruganandam, S., Rajendran, S. (2020). Dealing with Class Imbalances for Detection of Fraudulent Credit Card Transactions. International Journal of Advanced Science and Technology, 29, 7960–7967. Available at: https://www.researchgate.net/publication/343712209_Dealing_with_Class_Imbalances_for_Detection_of_Fraudulent_Credit_Card_Transactions
Marella, S. T., Karthikeya, K., Myla, S., Sai, M. M., Allam, V. (2019). Detecting fraudulent credit card transactions using outlier detection. International Journal of Scientific & Technology Research, 8 (10), 630–637. Available at: https://www.ijstr.org/final-print/oct2019/Detecting-Fraudulent-Credit-Card-Transactions-Using-Outlier-Detection.pdf
Fujiwara, K., Huang, Y., Hori, K., Nishioji, K., Kobayashi, M., Kamaguchi, M., Kano, M. (2020). Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis. Frontiers in Public Health, 8. doi: https://doi.org/10.3389/fpubh.2020.00178
Durga Prasad, D., Prasad, D. V., Rao, K. N. (2019). Imbalanced Data Using with-in Class Majority Under Sampling Approach. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). doi: https://doi.org/10.1109/icecct.2019.8869339
Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., Zeineddine, H. (2019). An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access, 7, 93010–93022. doi: https://doi.org/10.1109/access.2019.2927266
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., Anderla, A. (2019). Credit Card Fraud Detection - Machine Learning methods. 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH). doi: https://doi.org/10.1109/infoteh.2019.8717766
Zhang, Y., Liu, G., Zheng, L., Yan, C. (2019). A Hierarchical Clustering Strategy of Processing Class Imbalance and Its Application in Fraud Detection. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). doi: https://doi.org/10.1109/hpcc/smartcity/dss.2019.00249
Kotekani, S. S., Velchamy, I. (2021). An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection. Journal of Computing and Information Technology, 28 (4), 269–285. doi: https://doi.org/10.20532/cit.2020.1005216
Mînăstireanu, E.-A., Meșniță, G. (2020). Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection. Brain. Broad research in artificial intelligence and neuroscience, 11 (1), 131–143. doi: https://doi.org/10.18662/brain/11.1/19
Singh, A., Ranjan, R. K., Tiwari, A. (2021). Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms. Journal of Experimental & Theoretical Artificial Intelligence, 1–28. doi: https://doi.org/10.1080/0952813x.2021.1907795
Seera, M., Lim, C. P., Kumar, A., Dhamotharan, L., Tan, K. H. (2021). An intelligent payment card fraud detection system. Annals of Operations Research. doi: https://doi.org/10.1007/s10479-021-04149-2
Johnson, A. A., Ott, M. Q., Dogucu, M. (2022). Logistic Regression. Bayes Rules!, 329–354. doi: https://doi.org/10.1201/9780429288340-13
Singh, B., Mahrishi, M. (2020). Comparing Different Models for Credit Card Fraud Detection. SKIT Research Journal, 10 (2), 8. doi: https://doi.org/10.47904/ijskit.10.2.2020.8-12
Kumar, P. S., K, A. K., Mohapatra, S., Naik, B., Nayak, J., Mishra, M. (2021). CatBoost Ensemble Approach for Diabetes Risk Prediction at Early Stages. 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON). doi: https://doi.org/10.1109/odicon50556.2021.9428943
Credit Card Fraud Detection. Available at: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Taha, A. A., Malebary, S. J. (2020). An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine. IEEE Access, 8, 25579–25587. doi: https://doi.org/10.1109/access.2020.2971354
Vengatesan, K., Kumar, A., Yuvraj, S., Ambeth Kumar, V. D., Sabnis, S. S. (2020). Credit card fraud detection using data analytics techniques. Advances in Mathematics: Scientific Journal, 9 (3), 1177–1188. doi: https://doi.org/10.37418/amsj.9.3.43
Weisburd, D., Wilson, D. B., Wooditch, A., Britt, C. (2021). Logistic Regression. Advanced Statistics in Criminology and Criminal Justice, 127–185. doi: https://doi.org/10.1007/978-3-030-67738-1_4
Panda, R. M., Daya Sagar, B. S. (2021). Decision Tree. Encyclopedia of Earth Sciences Series, 1–6. doi: https://doi.org/10.1007/978-3-030-26050-7_81-1
Alsaleem, M., Hasoon, S. (2020). Predicting Bank Loan Risks Using Machine Learning Algorithms. AL-Rafidain Journal of Computer Sciences and Mathematics, 14 (1), 159–168. doi: https://doi.org/10.33899/csmj.2020.164686
Sankar, S., Potti, A., Chandrika, G. N., Ramasubbareddy, S. (2022). Thyroid Disease Prediction Using XGBoost Algorithms. Journal of Mobile Multimedia. doi: https://doi.org/10.13052/jmm1550-4646.18322
Abdulghani, A. Q., UCAN, O. N., Alheeti, K. M. A. (2021). Credit Card Fraud Detection Using XGBoost Algorithm. 2021 14th International Conference on Developments in eSystems Engineering (DeSE). doi: https://doi.org/10.1109/dese54285.2021.9719580
Omogbhemhe, M. I., Momodu, I. B. A. (2021). Model for Predicting Bank Loan Default using XGBoost. International Journal of Computer Applications, 183 (32), 1–4. doi: https://doi.org/10.5120/ijca2021921705
Jumabek, A., Yang, S., Noh, Y. (2021). CatBoost-Based Network Intrusion Detection on Imbalanced CIC-IDS-2018 Dataset. The Journal of Korean Institute of Communications and Information Sciences, 46 (12), 2191–2197. doi: https://doi.org/10.7840/kics.2021.46.12.2191
Pujara, A., Pattabiraman, V., Parvathi, R. (2022). Food Demand Forecast for Online Food Delivery Service Using CatBoost Model. 3rd EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, 129–142. doi: https://doi.org/10.1007/978-3-030-78750-9_9
Bhati, N. S., Khari, M. (2021). A New Intrusion Detection Scheme Using CatBoost Classifier. Forthcoming Networks and Sustainability in the IoT Era, 169–176. doi: https://doi.org/10.1007/978-3-030-69431-9_13
Qi, J., Yang, R., Wang, P. (2021). Application of explainable machine learning based on Catboost in credit scoring. Journal of Physics: Conference Series, 1955 (1), 012039. doi: https://doi.org/10.1088/1742-6596/1955/1/012039
Abdullahi, A. I., Raheem, L., Muhammed, M., Rabiat, O., Ganiyu, A. (2020). Comparison of the CatBoost Classifier with other Machine Learning Methods. International Journal of Advanced Computer Science and Applications, 11 (11). doi: https://doi.org/10.14569/ijacsa.2020.0111190
Hema, A. (2020). Machine Learning methods for Discovering Credit Card Fraud. International Research Journal of Computer Science, 8 (1), 1–6. Available at: https://www.researchgate.net/publication/350720972_MACHINE_LEARNING_METHODS_FOR_DISCOVERING_CREDIT_CARD_FRAUD
Agrawal, T. (2021). Bayesian Optimization. Hyperparameter Optimization in Machine Learning, 81–108. doi: https://doi.org/10.1007/978-1-4842-6579-6_4

Fraud detection under the unbalanced class based on gradient boosting

Authors

DOI:

Keywords:

Abstract

Author Biographies

Raya Alothman, University of Mosul

Hassanein Ali Talib, University of Mosul

Mazin S. Mohammed, University of Mosul

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Make a Submission

Developed By

Current Issue