Automatic machine learning algorithms for fraud detection in digital payment systems
Keywords:digital payments, machine learning, automated synthesis, fraud detection, data science
Data on global financial statistics demonstrate that total losses from fraudulent transactions around the world are constantly growing. The issue of payment fraud will be exacerbated by the digitalization of economic relations, in particular the introduction by banks of the concept of "Bank-as-a-Service", which will increase the burden on payment services.
The aim of this study is to synthesize effective models for detecting fraud in digital payment systems using automated machine learning and Big Data analysis algorithms.
Approaches to expanding the information base to detect fraudulent transactions have been proposed and systematized. The choice of performance metrics for building and comparing models has been substantiated.
The use of automatic machine learning algorithms has been proposed to resolve the issue, which makes it possible in a short time to go through a large number of variants of models, their ensembles, and input data sets. As a result, our experiments allowed us to obtain the quality of classification based on the AUC metric at the level of 0.977‒0.982. This exceeds the effectiveness of the classifiers developed by traditional methods, even as the time spent on the synthesis of the models is much less and measured in hours. The models' ensemble has made it possible to detect up to 85.7 % of fraudulent transactions in the sample. The accuracy of fraud detection is also high (79‒85 %).
The results of our study confirm the effectiveness of using automatic machine learning algorithms to synthesize fraud detection models in digital payment systems. In this case, efficiency is manifested not only by the resulting classifiers' quality but also by the reduction in the cost of their development, as well as by the high potential of interpretability. Implementing the study results could enable financial institutions to reduce the financial and temporal costs of developing and updating active systems against payment fraud, as well as improve the effectiveness of monitoring financial transactions
- The Nilson Report (2013). Issue 1023. Available at: https://nilsonreport.com/publication_newsletter_archive_issue.php?issue=1023
- The Nilson Report (2017). Issue 1118. Available at: https://nilsonreport.com/publication_newsletter_archive_issue.php?issue=1118
- Pozzolo, A. D., Caelen, O., Johnson, R. A., Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational Intelligence. doi: https://doi.org/10.1109/ssci.2015.33
- Dal Pozzolo, A., Caelen, O., Waterschoot, S., Bontempi, G. (2013). Racing for Unbalanced Methods Selection. Lecture Notes in Computer Science, 24–31. doi: https://doi.org/10.1007/978-3-642-41278-3_4
- Polozhennia pro orhanizatsiyu zakhodiv iz zabezpechennia informatsiynoi bezpeky v bankivskiy systemi Ukrainy 28.09.2017 No. 95. Available at: https://zakon.rada.gov.ua/laws/show/v0095500-17#Text
- Pro zapobihannia ta protydiu lehalizatsiyi (vidmyvanniu) dokhodiv, oderzhanykh zlochynnym shliakhom, finansuvanniu teroryzmu ta finansuvanniu rozpovsiudzhennia zbroi masovoho znyshchennia 2020, No. 25, st. 17. Available at: https://zakon.rada.gov.ua/laws/show/361-20#n831
- Dal Pozzolo, A. (2015). Adaptive Machine learning for credit card fraud detection. Université Libre de Bruxelles. Available at: http://di.ulb.ac.be/map/adalpozz/pdf/Dalpozzolo2015PhD.pdf
- Russac, Y., Caelen, O., He-Guelton, L. (2018). Embeddings of Categorical Variables for Sequential Data in Fraud Context. Advances in Intelligent Systems and Computing, 542–552. doi: https://doi.org/10.1007/978-3-319-74690-6_53
- Carcillo, F., Le Borgne, Y.-A., Caelen, O., Kessaci, Y., Oblé, F., Bontempi, G. (2019). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences. doi: https://doi.org/10.1016/j.ins.2019.05.042
- Lebichot, B., Braun, F., Caelen, O., Saerens, M. (2016). A graph-based, semi-supervised, credit card fraud detection system. Complex Networks & Their Applications V, 721–733. doi: https://doi.org/10.1007/978-3-319-50901-3_57
- Lebichot, B., Le Borgne, Y.-A., He-Guelton, L., Oblé, F., Bontempi, G. (2019). Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection. Recent Advances in Big Data and Deep Learning, 78–88. doi: https://doi.org/10.1007/978-3-030-16841-4_8
- Georgieva, S., Markova, M., Pavlov, V. (2019). Using neural network for credit card fraud detection. Renewable energy sources and technologies. doi: https://doi.org/10.1063/1.5127478
- Lucas, Y., Portier, P.-E., Laporte, L. et. al. (2019). Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Available at: https://www.researchgate.net/publication/335600419
- Fraud detection with machine learning. Available at: https://www.researchgate.net/project/Fraud-detection-with-machine-learning
- Wei, W., Li, J., Cao, L., Ou, Y., Chen, J. (2012). Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web, 16 (4), 449–475. doi: https://doi.org/10.1007/s11280-012-0178-0
- Mahmoudi, N., Duman, E. (2015). Detecting credit card fraud by Modified Fisher Discriminant Analysis. Expert Systems with Applications, 42 (5), 2510–2516. doi: https://doi.org/10.1016/j.eswa.2014.10.037
- Sudjianto, A., Nair, S., Yuan, M., Zhang, A., Kern, D., Cela-Díaz, F. (2010). Statistical Methods for Fighting Financial Crimes. Technometrics, 52 (1), 5–19. doi: https://doi.org/10.1198/tech.2010.07032
- Patidar, R., Sharma, L. (2011). Credit card fraud detection using neural network. International Journal of Soft Computing and Engineering (IJSCE), 1, 32–38. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.301.8231&rep=rep1&type=pdf
- Mints, A. (2017). Classification of tasks of data mining and data processing in the economy. Baltic Journal of Economic Studies, 3 (3), 47–52. doi: https://doi.org/10.30525/2256-0742/2017-3-3-47-52
- Sahin, Y., Bulkan, S., Duman, E. (2013). A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications, 40 (15), 5916–5923. doi: https://doi.org/10.1016/j.eswa.2013.05.021
- Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C. (2018). Random forest for credit card fraud detection. 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). doi: https://doi.org/10.1109/icnsc.2018.8361343
- Fu, K., Cheng, D., Tu, Y., Zhang, L. (2016). Credit Card Fraud Detection Using Convolutional Neural Networks. Lecture Notes in Computer Science, 483–490. doi: https://doi.org/10.1007/978-3-319-46675-0_53
- Zareapoor, M., Shamsolmoali, P. (2015). Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier. Procedia Computer Science, 48, 679–685. doi: https://doi.org/10.1016/j.procs.2015.04.201
- Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5 (2), 197–227. doi: https://doi.org/10.1007/bf00116037
- Sammut, C., Webb, G. I. (Eds.) (2010). Encyclopedia of machine learning. Springer. doi: https://doi.org/10.1007/978-0-387-30164-8
- Vnukova, N., Kavun, S., Kolodiziev, O., Achkasova, S., Hontar, D. (2019). Determining the level of bank connectivity for combating money laundering, terrorist financing and proliferation of weapons of mass destruction. Banks and Bank Systems, 14 (4), 42–54. doi: https://doi.org/10.21511/bbs.14(4).2019.05
- Malyaretz, L., Dorokhov, O., Dorokhova, L. (2018). Method of Constructing the Fuzzy Regression Model of Bank Competitiveness. Journal of Central Banking Theory and Practice, 7 (2), 139–164. doi: https://doi.org/10.2478/jcbtp-2018-0016
- Minsky, M., Papert, S. (2017). Perceptrons. MIT Press. doi: https://doi.org/10.7551/mitpress/11301.001.0001
- Driverless AI Documentation - Overview. Available at: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/index.html
- Driverless AI Documentation - Scorers. Available at: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/scorers.html
- Fabuš, M., Dubrovina, N., Guryanova, L., Chernova, N., Zyma, O. (2019). Strengthening financial decentralization: driver or risk factor for sustainable socio-economic development of territories? Entrepreneurship and Sustainability Issues, 7 (2), 875–890. doi: https://doi.org/10.9770/jesi.2019.7.2(6)
- Mints, O., Marhasova, V., Hlukha, H., Kurok, R., Kolodizieva, T. (2019). Analysis of the stability factors of Ukrainian banks during the 2014–2017 systemic crisis using the Kohonen self-organizing neural networks. Banks and Bank Systems, 14 (3), 86–98. doi: https://doi.org/10.21511/bbs.14(3).2019.08
How to Cite
Copyright (c) 2020 Oleh Kolodiziev, Aleksey Mints, Pavlo Sidelov, Inna Pleskun, Olha Lozynska
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with PC TECHNOLOGY CENTER, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher PC TECHNOLOGY CENTER does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.