Development of a method for fraud detection in heterogeneous data during installation of mobile applications

Authors

DOI:

https://doi.org/10.15587/1729-4061.2019.155060

Keywords:

fraud detection, heterogeneous data, installation of mobile applications, data abnormalities, data scaling

Abstract

A method for fraud detection when installing mobile applications was proposed. The developed method, in contrast to existing ones, uses all available data regardless of their types, dimensions, and discrepancies and converts such data into homogeneous coefficients based on the proposed scaling method. This approach allows one to improve accuracy of task solution and build an open to expansion knowledge base with characteristics of fraudsters and rules of detecting fraudulent users. A system of scales for converting heterogeneous data into homogeneous coefficients has been developed which has enabled construction of a mathematical model of the scaling process. The algorithm of scaling heterogeneous data sets based on the proposed scales and the mathematical model of the process of scaling large arrays of heterogeneous data has been developed which has made it possible to reduce the whole data set to two homogeneous groups. The algorithms of processing the resulting groups of homogeneous data and detection of fraudulent users were offered. The developed algorithms using coefficients of similarity between user characteristics form fingerprints of fraudsters, determine characteristics and dependences of fraudsters which allows one to increase efficiency and speed of the process of fraudster detection. A scheme of the fraud detection process which was used in the intelligent system of automatic detection of fraudsters for carrying out of experimental studies was proposed. According to the results of experimental study, accuracy of fraudster detection was 99.14 % for a given representative sample. The results of experimental studies have shown effectiveness of automatic detection of fraudsters and the possibility of expanding formats and characteristics of fraudsters based on intelligent analysis and knowledge bases.

Author Biographies

Tetiana Polhul, Vinnytsia National Technical University Khmelnytske shose str., 95, Vinnytsia, Ukraine, 21021

Postgraduate student

Department of Computer Science

Andrii Yarovyi, Vinnytsia National Technical University Khmelnytske shose str., 95, Vinnytsia, Ukraine, 21021

Doctor of Technical Sciences, Professor, Head of Department

Department of Computer Science

References

  1. Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection. ACM Computing Surveys, 41 (3), 1–58. doi: https://doi.org/10.1145/1541880.1541882
  2. Song, X., Wu, M., Jermaine, C., Ranka, S. (2007). Conditional Anomaly Detection. IEEE Transactions on Knowledge and Data Engineering, 19 (5), 631–645. doi: https://doi.org/10.1109/tkde.2007.1009
  3. Gricenko, A. V. (2012). Tipy anomaliy v videoizobrazheniyah. Tekhnicheskie nauki – ot teorii k praktike: sbornik statey po materialam VII mezhdunarodnoy nauchno-prakticheskoy konferencii. Chast' I. Novosibirsk: SibAK. Available at: https://sibac.info/conf/tech/vii/26730
  4. Prado-Romero, M. A., Gago-Alonso, A. (2016). Detecting contextual collective anomalies at a Glance. 2016 23rd International Conference on Pattern Recognition (ICPR). doi: https://doi.org/10.1109/icpr.2016.7900017
  5. Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Aurélien Géron, O’Reilly Media, 574.
  6. Cielen, D., Meysman, A. D. B., Ali, M. (2016). Introducing Data Science: Big data, machine learning, and more, using Python tools. Manning, 320.
  7. Guido, S., Müller, A. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, 400.
  8. Chollet, F. (2017). Deep Learning with Python. Manning, 384.
  9. Agrawal, R., Srikant, R. (1995). Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering. doi: https://doi.org/10.1109/icde.1995.380415
  10. Agarwal, D. (2005). An Empirical Bayes Approach to Detect Anomalies in Dynamic Multidimensional Arrays. Fifth IEEE International Conference on Data Mining (ICDM'05). doi: https://doi.org/10.1109/icdm.2005.22
  11. Siaterlis, C., Maglaris, B. (2004). Towards multisensor data fusion for DoS detection. Proceedings of the 2004 ACM symposium on Applied computing – SAC '04. doi: https://doi.org/10.1145/967900.967992
  12. Agarwal, D. (2006). Detecting anomalies in cross-classified streams: a Bayesian approach. Knowledge and Information Systems, 11 (1), 29–44. doi: https://doi.org/10.1007/s10115-006-0036-4
  13. MachineLearning.ru. Professional'nyy informacionno-analiticheskiy resurs, posvyashchennyy mashinnomu obucheniyu, raspoznavaniyu obrazov i intellektual'nomu analizu dannyah. Available at: http://www.machinelearning.ru
  14. Polhul, T. D., Yarovyi, A. A. (2016). Vyznachennia shakhraiskykh operatsiy pry vstanovlenni mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Suchasni tendentsiyi rozvytku systemnoho prohramuvannia. Tezy dopovidei. Kyiv, 55–56. Available at: http://ccs.nau.edu.ua/wp-content/uploads/2017/12/%D0%A1%D0%A2%D0%A0%D0%A1%D0%9F_2016_07.pdf
  15. Polhul, T. D., Yarovyi, A. A. (2017). Vyznachennia shakhraiskykh operatsiy pry instaliatsiyi mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Materialy XLVI naukovo-tekhnichnoi konferentsiyi pidrozdiliv VNTU. Vinnytsia. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/17200/2158.pdf?sequence=3
  16. Yarovyі, A. A., Romanyuk, O. N., Arsenyuk, I. R., Polhul, T. D. (2017). Program applications install fraud detection using data mining. Naukovi pratsi Donetskoho natsionalnoho tekhnichnoho universytetu. Seriya: “Informatyka, kibernetyka ta obchysliuvalna tekhnika”, 2 (25), 126–131. Available at: http://science.donntu.edu.ua/wp-content/uploads/2018/03/ikvt_2017_2_site-1.pdf
  17. Yarovyi, A., Polhul, T., Krylyk, L. (2018). Rozrobka metodu vyiavlennia shakhraistva pry instaliuvanni mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Materialy konferentsiyi «XLVII Naukovo-tekhnichna konferentsiya pidrozdiliv Vinnytskoho natsionalnoho tekhnichnoho universytetu (2018)». Vinnytsia. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/22722/079.pdf?sequence=1
  18. Kiulian, A. H., Polhul, T. D., Khazin, M. B. (2012). Matematychna model rekomendatsiynoho servisu na osnovi metodu kolaboratyvnoi filtratsiyi. Kompiuterni tekhnolohiyi ta Internet v informatsiynomu suspilstvi, 226–227. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/7911/226-227.pdf?sequence=1&isAllowed=y
  19. Segaran, T. (2008). Programming Collective Intelligence. Building Smart Web 2.0 Applications. O’Reilly Media, 368.
  20. Yeung, D.-Y., Chow, C. (2002). Parzen-window network intrusion detectors. Object recognition supported by user interaction for service robots. doi: https://doi.org/10.1109/icpr.2002.1047476
  21. Hodge, V. J., Austin, J. (2004). A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22 (2), 85–126. doi: https://doi.org/10.1007/s10462-004-4304-y
  22. Agyemang, M., Barker, K., Alhajj, R. (2006). A comprehensive survey of numeric and symbolic outlier mining techniques. Intelligent Data Analysis, 10 (6), 521–538. doi: https://doi.org/10.3233/ida-2006-10604
  23. Keogh, E., Lin, J., Fu, A. (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. Fifth IEEE International Conference on Data Mining (ICDM’05). doi: https://doi.org/10.1109/icdm.2005.79
  24. Keogh, E., Lin, J., Lee, S.-H., Herle, H. V. (2006). Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 11 (1), 1–27. doi: https://doi.org/10.1007/s10115-006-0034-6
  25. Donoho, S. (2004). Early detection of insider trading in option markets. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’04. doi: https://doi.org/10.1145/1014052.1014100
  26. Fu, A. W., Leung, O. T.-W., Keogh, E., Lin, J. (2006). Finding Time Series Discords Based on Haar Transform. Lecture Notes in Computer Science, 31–41. doi: https://doi.org/10.1007/11811305_3
  27. Yarovyi, A. A., Polhul, T. D. (2015). Pidvyshchennia produktyvnosti obchysliuvalnykh protsesiv v paralelno-ierarkhichniy merezhi za dopomohoiu Framework Benchmark Akka. Zbirnyk tez dopovidi VII Mizhnarodnoi naukovo-tekhnichnoi konferentsiyi «Fotonika ODS-2015». Vinnytsia, 9.
  28. Baudat, G., Anouar, F. (2000). Generalized Discriminant Analysis Using a Kernel Approach. Neural Computation, 12 (10), 2385–2404. doi: https://doi.org/10.1162/089976600300014980
  29. Yarovyi, A. A., Polhul, T. D. (2018). Kompiuterna prohrama «Prohramnyi modul zboru danykh informatsiynoi tekhnolohiyi» vyiavlennia shakhraistva pry instaliuvanni prohramnykh dodatkiv. Cvidotstvo pro reiestratsiu avtorskoho prava na tvir No. 76348. Kyiv: Ministerstvo ekonomichnoho rozvytku i torhivli Ukrainy.
  30. Yarovyi, A. A., Polhul, T. D. (2018). Kompiuterna prohrama «Prohramnyi modul vyznachennia skhozhosti korystuvachiv informatsiynoi tekhnolohiyi vyiavlennia shakhraistva pry instaliuvanni prohramnykh dodatkiv». Cvidotstvo pro reiestratsiu avtorskoho prava na tvir No. 76347. Kyiv: Ministerstvo ekonomichnoho rozvytku i torhivli Ukrainy.

Downloads

Published

2019-01-24

How to Cite

Polhul, T., & Yarovyi, A. (2019). Development of a method for fraud detection in heterogeneous data during installation of mobile applications. Eastern-European Journal of Enterprise Technologies, 1(2), 65–75. https://doi.org/10.15587/1729-4061.2019.155060