Fraud detection under the unbalanced class based on gradient boosting

Raya Alothman; Hassanein Ali Talib; Mazin S. Mohammed

doi:10.15587/1729-4061.2022.254922

Автор(и)

Raya Alothman University of Mosul, Ірак https://orcid.org/0000-0002-8959-3353
Hassanein Ali Talib University of Mosul, Ірак https://orcid.org/0000-0001-5276-9258
Mazin S. Mohammed University of Mosul, Ірак https://orcid.org/0000-0003-1744-1219

DOI:

https://doi.org/10.15587/1729-4061.2022.254922

Ключові слова:

машинне навчання, моделювання кредитного шахрайства, незбалансовані дані, алгоритми підвищення градієнта

Анотація

Моделювання кредитного шахрайства є важливою темою, що зачіпається дослідниками. У наданні кредитних послуг найважливішою ланкою є управління ризиками простроченої заборгованості. Це безпосередньо впливає на норму прибутковості та відсоток безнадійної заборгованості кредитних організацій у даному секторі. Кредитні фінансові послуги принесли користь широкій громадськості в результаті розвитку мобільного інтернету, а управління ризиками простроченої заборгованості перетворилося з ручного інструменту, в минулому заснованого на правилах, у кредитну модель, побудовану з використанням великого обсягу даних про клієнтів для прогнозування ймовірності їхньої неплатоспроможності. При створенні моделі кредитного рейтингу сформований характер кредитних вибірок зменшує оцінку вибірки класу меншості, тобто, отримання великої кількості фактичних вибірок призводить до зміщення моделей машинного навчання в бік класу більшості при навчанні. Традиційні методи балансування даних можуть зменшити зміщення моделей до категорії більшості, коли дані відносно незбалансовані, а не надлишкові. Для виявлення шахрайства з кредитами у роботі пропонуються алгоритми підвищення градієнта (XGBoost та CatBoost) для моделювання сильно незбалансованих даних. Для знаходження гіперпараметрів та визначення точності класу меншості в якості функції оптимізації моделі використовується байєсівська оптимізація для підвищення точності моделі для класу меншості. Дослідження було перевірено на реальних даних про шахрайство з кредитними картками у Європі. Результати були зіставлені з традиційним машинним навчанням (дерева прийняття рішень та логістична регресія) і ефективністю алгоритму беггінга (випадковий ліс). Для порівняння використовується традиційний метод балансування даних (передискретизація)

Біографії авторів

Raya Alothman, University of Mosul

Lecturer, Faculty Member

Department of Computer Science

College of Pure Sciences for Education

Hassanein Ali Talib, University of Mosul

Assistant Teacher, Faculty Member

Department of Computer Science

College of Pure Sciences for Education

Mazin S. Mohammed, University of Mosul

Assistant Teacher, Faculty Member

Department of Postgraduate

Посилання

McNulty, D., Milne, A. (2021). Bigger Fish to Fry: FinTech and the Digital Transformation of Financial Services. Disruptive Technology in Banking and Finance, 263–281. doi: https://doi.org/10.1007/978-3-030-81835-7_10
Breidbach, C. F., Keating, B. W., Lim, C. (2019). Fintech: research directions to explore the digital transformation of financial service systems. Journal of Service Theory and Practice, 30 (1), 79–102. doi: https://doi.org/10.1108/jstp-08-2018-0185
Aggarwal, N. (2021). The norms of algorithmic credit scoring. The Cambridge Law Journal, 80 (1), 42–73. doi: https://doi.org/10.1017/s0008197321000015
Alfaiz, N. S., Fati, S. M. (2022). Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics, 11 (4), 662. doi: https://doi.org/10.3390/electronics11040662
Vaidhya, A., Muruganandam, S., Rajendran, S. (2020). Dealing with Class Imbalances for Detection of Fraudulent Credit Card Transactions. International Journal of Advanced Science and Technology, 29, 7960–7967. Available at: https://www.researchgate.net/publication/343712209_Dealing_with_Class_Imbalances_for_Detection_of_Fraudulent_Credit_Card_Transactions
Marella, S. T., Karthikeya, K., Myla, S., Sai, M. M., Allam, V. (2019). Detecting fraudulent credit card transactions using outlier detection. International Journal of Scientific & Technology Research, 8 (10), 630–637. Available at: https://www.ijstr.org/final-print/oct2019/Detecting-Fraudulent-Credit-Card-Transactions-Using-Outlier-Detection.pdf
Fujiwara, K., Huang, Y., Hori, K., Nishioji, K., Kobayashi, M., Kamaguchi, M., Kano, M. (2020). Over- and Under-sampling Approach for Extremely Imbalanced and Small Minority Data Problem in Health Record Analysis. Frontiers in Public Health, 8. doi: https://doi.org/10.3389/fpubh.2020.00178
Durga Prasad, D., Prasad, D. V., Rao, K. N. (2019). Imbalanced Data Using with-in Class Majority Under Sampling Approach. 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). doi: https://doi.org/10.1109/icecct.2019.8869339
Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., Zeineddine, H. (2019). An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access, 7, 93010–93022. doi: https://doi.org/10.1109/access.2019.2927266
Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., Anderla, A. (2019). Credit Card Fraud Detection - Machine Learning methods. 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH). doi: https://doi.org/10.1109/infoteh.2019.8717766
Zhang, Y., Liu, G., Zheng, L., Yan, C. (2019). A Hierarchical Clustering Strategy of Processing Class Imbalance and Its Application in Fraud Detection. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). doi: https://doi.org/10.1109/hpcc/smartcity/dss.2019.00249
Kotekani, S. S., Velchamy, I. (2021). An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection. Journal of Computing and Information Technology, 28 (4), 269–285. doi: https://doi.org/10.20532/cit.2020.1005216
Mînăstireanu, E.-A., Meșniță, G. (2020). Methods of Handling Unbalanced Datasets in Credit Card Fraud Detection. Brain. Broad research in artificial intelligence and neuroscience, 11 (1), 131–143. doi: https://doi.org/10.18662/brain/11.1/19
Singh, A., Ranjan, R. K., Tiwari, A. (2021). Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms. Journal of Experimental & Theoretical Artificial Intelligence, 1–28. doi: https://doi.org/10.1080/0952813x.2021.1907795
Seera, M., Lim, C. P., Kumar, A., Dhamotharan, L., Tan, K. H. (2021). An intelligent payment card fraud detection system. Annals of Operations Research. doi: https://doi.org/10.1007/s10479-021-04149-2
Johnson, A. A., Ott, M. Q., Dogucu, M. (2022). Logistic Regression. Bayes Rules!, 329–354. doi: https://doi.org/10.1201/9780429288340-13
Singh, B., Mahrishi, M. (2020). Comparing Different Models for Credit Card Fraud Detection. SKIT Research Journal, 10 (2), 8. doi: https://doi.org/10.47904/ijskit.10.2.2020.8-12
Kumar, P. S., K, A. K., Mohapatra, S., Naik, B., Nayak, J., Mishra, M. (2021). CatBoost Ensemble Approach for Diabetes Risk Prediction at Early Stages. 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON). doi: https://doi.org/10.1109/odicon50556.2021.9428943
Credit Card Fraud Detection. Available at: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Taha, A. A., Malebary, S. J. (2020). An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine. IEEE Access, 8, 25579–25587. doi: https://doi.org/10.1109/access.2020.2971354
Vengatesan, K., Kumar, A., Yuvraj, S., Ambeth Kumar, V. D., Sabnis, S. S. (2020). Credit card fraud detection using data analytics techniques. Advances in Mathematics: Scientific Journal, 9 (3), 1177–1188. doi: https://doi.org/10.37418/amsj.9.3.43
Weisburd, D., Wilson, D. B., Wooditch, A., Britt, C. (2021). Logistic Regression. Advanced Statistics in Criminology and Criminal Justice, 127–185. doi: https://doi.org/10.1007/978-3-030-67738-1_4
Panda, R. M., Daya Sagar, B. S. (2021). Decision Tree. Encyclopedia of Earth Sciences Series, 1–6. doi: https://doi.org/10.1007/978-3-030-26050-7_81-1
Alsaleem, M., Hasoon, S. (2020). Predicting Bank Loan Risks Using Machine Learning Algorithms. AL-Rafidain Journal of Computer Sciences and Mathematics, 14 (1), 159–168. doi: https://doi.org/10.33899/csmj.2020.164686
Sankar, S., Potti, A., Chandrika, G. N., Ramasubbareddy, S. (2022). Thyroid Disease Prediction Using XGBoost Algorithms. Journal of Mobile Multimedia. doi: https://doi.org/10.13052/jmm1550-4646.18322
Abdulghani, A. Q., UCAN, O. N., Alheeti, K. M. A. (2021). Credit Card Fraud Detection Using XGBoost Algorithm. 2021 14th International Conference on Developments in eSystems Engineering (DeSE). doi: https://doi.org/10.1109/dese54285.2021.9719580
Omogbhemhe, M. I., Momodu, I. B. A. (2021). Model for Predicting Bank Loan Default using XGBoost. International Journal of Computer Applications, 183 (32), 1–4. doi: https://doi.org/10.5120/ijca2021921705
Jumabek, A., Yang, S., Noh, Y. (2021). CatBoost-Based Network Intrusion Detection on Imbalanced CIC-IDS-2018 Dataset. The Journal of Korean Institute of Communications and Information Sciences, 46 (12), 2191–2197. doi: https://doi.org/10.7840/kics.2021.46.12.2191
Pujara, A., Pattabiraman, V., Parvathi, R. (2022). Food Demand Forecast for Online Food Delivery Service Using CatBoost Model. 3rd EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing, 129–142. doi: https://doi.org/10.1007/978-3-030-78750-9_9
Bhati, N. S., Khari, M. (2021). A New Intrusion Detection Scheme Using CatBoost Classifier. Forthcoming Networks and Sustainability in the IoT Era, 169–176. doi: https://doi.org/10.1007/978-3-030-69431-9_13
Qi, J., Yang, R., Wang, P. (2021). Application of explainable machine learning based on Catboost in credit scoring. Journal of Physics: Conference Series, 1955 (1), 012039. doi: https://doi.org/10.1088/1742-6596/1955/1/012039
Abdullahi, A. I., Raheem, L., Muhammed, M., Rabiat, O., Ganiyu, A. (2020). Comparison of the CatBoost Classifier with other Machine Learning Methods. International Journal of Advanced Computer Science and Applications, 11 (11). doi: https://doi.org/10.14569/ijacsa.2020.0111190
Hema, A. (2020). Machine Learning methods for Discovering Credit Card Fraud. International Research Journal of Computer Science, 8 (1), 1–6. Available at: https://www.researchgate.net/publication/350720972_MACHINE_LEARNING_METHODS_FOR_DISCOVERING_CREDIT_CARD_FRAUD
Agrawal, T. (2021). Bayesian Optimization. Hyperparameter Optimization in Machine Learning, 81–108. doi: https://doi.org/10.1007/978-1-4842-6579-6_4