Development of a method for fraud detection in heterogeneous data during installation of mobile applications
DOI:
https://doi.org/10.15587/1729-4061.2019.155060Keywords:
fraud detection, heterogeneous data, installation of mobile applications, data abnormalities, data scalingAbstract
A method for fraud detection when installing mobile applications was proposed. The developed method, in contrast to existing ones, uses all available data regardless of their types, dimensions, and discrepancies and converts such data into homogeneous coefficients based on the proposed scaling method. This approach allows one to improve accuracy of task solution and build an open to expansion knowledge base with characteristics of fraudsters and rules of detecting fraudulent users. A system of scales for converting heterogeneous data into homogeneous coefficients has been developed which has enabled construction of a mathematical model of the scaling process. The algorithm of scaling heterogeneous data sets based on the proposed scales and the mathematical model of the process of scaling large arrays of heterogeneous data has been developed which has made it possible to reduce the whole data set to two homogeneous groups. The algorithms of processing the resulting groups of homogeneous data and detection of fraudulent users were offered. The developed algorithms using coefficients of similarity between user characteristics form fingerprints of fraudsters, determine characteristics and dependences of fraudsters which allows one to increase efficiency and speed of the process of fraudster detection. A scheme of the fraud detection process which was used in the intelligent system of automatic detection of fraudsters for carrying out of experimental studies was proposed. According to the results of experimental study, accuracy of fraudster detection was 99.14 % for a given representative sample. The results of experimental studies have shown effectiveness of automatic detection of fraudsters and the possibility of expanding formats and characteristics of fraudsters based on intelligent analysis and knowledge bases.
References
- Chandola, V., Banerjee, A., Kumar, V. (2009). Anomaly detection. ACM Computing Surveys, 41 (3), 1–58. doi: https://doi.org/10.1145/1541880.1541882
- Song, X., Wu, M., Jermaine, C., Ranka, S. (2007). Conditional Anomaly Detection. IEEE Transactions on Knowledge and Data Engineering, 19 (5), 631–645. doi: https://doi.org/10.1109/tkde.2007.1009
- Gricenko, A. V. (2012). Tipy anomaliy v videoizobrazheniyah. Tekhnicheskie nauki – ot teorii k praktike: sbornik statey po materialam VII mezhdunarodnoy nauchno-prakticheskoy konferencii. Chast' I. Novosibirsk: SibAK. Available at: https://sibac.info/conf/tech/vii/26730
- Prado-Romero, M. A., Gago-Alonso, A. (2016). Detecting contextual collective anomalies at a Glance. 2016 23rd International Conference on Pattern Recognition (ICPR). doi: https://doi.org/10.1109/icpr.2016.7900017
- Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Aurélien Géron, O’Reilly Media, 574.
- Cielen, D., Meysman, A. D. B., Ali, M. (2016). Introducing Data Science: Big data, machine learning, and more, using Python tools. Manning, 320.
- Guido, S., Müller, A. (2016). Introduction to Machine Learning with Python: A Guide for Data Scientists. O’Reilly Media, 400.
- Chollet, F. (2017). Deep Learning with Python. Manning, 384.
- Agrawal, R., Srikant, R. (1995). Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering. doi: https://doi.org/10.1109/icde.1995.380415
- Agarwal, D. (2005). An Empirical Bayes Approach to Detect Anomalies in Dynamic Multidimensional Arrays. Fifth IEEE International Conference on Data Mining (ICDM'05). doi: https://doi.org/10.1109/icdm.2005.22
- Siaterlis, C., Maglaris, B. (2004). Towards multisensor data fusion for DoS detection. Proceedings of the 2004 ACM symposium on Applied computing – SAC '04. doi: https://doi.org/10.1145/967900.967992
- Agarwal, D. (2006). Detecting anomalies in cross-classified streams: a Bayesian approach. Knowledge and Information Systems, 11 (1), 29–44. doi: https://doi.org/10.1007/s10115-006-0036-4
- MachineLearning.ru. Professional'nyy informacionno-analiticheskiy resurs, posvyashchennyy mashinnomu obucheniyu, raspoznavaniyu obrazov i intellektual'nomu analizu dannyah. Available at: http://www.machinelearning.ru
- Polhul, T. D., Yarovyi, A. A. (2016). Vyznachennia shakhraiskykh operatsiy pry vstanovlenni mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Suchasni tendentsiyi rozvytku systemnoho prohramuvannia. Tezy dopovidei. Kyiv, 55–56. Available at: http://ccs.nau.edu.ua/wp-content/uploads/2017/12/%D0%A1%D0%A2%D0%A0%D0%A1%D0%9F_2016_07.pdf
- Polhul, T. D., Yarovyi, A. A. (2017). Vyznachennia shakhraiskykh operatsiy pry instaliatsiyi mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Materialy XLVI naukovo-tekhnichnoi konferentsiyi pidrozdiliv VNTU. Vinnytsia. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/17200/2158.pdf?sequence=3
- Yarovyі, A. A., Romanyuk, O. N., Arsenyuk, I. R., Polhul, T. D. (2017). Program applications install fraud detection using data mining. Naukovi pratsi Donetskoho natsionalnoho tekhnichnoho universytetu. Seriya: “Informatyka, kibernetyka ta obchysliuvalna tekhnika”, 2 (25), 126–131. Available at: http://science.donntu.edu.ua/wp-content/uploads/2018/03/ikvt_2017_2_site-1.pdf
- Yarovyi, A., Polhul, T., Krylyk, L. (2018). Rozrobka metodu vyiavlennia shakhraistva pry instaliuvanni mobilnykh dodatkiv z vykorystanniam intelektualnoho analizu danykh. Materialy konferentsiyi «XLVII Naukovo-tekhnichna konferentsiya pidrozdiliv Vinnytskoho natsionalnoho tekhnichnoho universytetu (2018)». Vinnytsia. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/22722/079.pdf?sequence=1
- Kiulian, A. H., Polhul, T. D., Khazin, M. B. (2012). Matematychna model rekomendatsiynoho servisu na osnovi metodu kolaboratyvnoi filtratsiyi. Kompiuterni tekhnolohiyi ta Internet v informatsiynomu suspilstvi, 226–227. Available at: http://ir.lib.vntu.edu.ua/bitstream/handle/123456789/7911/226-227.pdf?sequence=1&isAllowed=y
- Segaran, T. (2008). Programming Collective Intelligence. Building Smart Web 2.0 Applications. O’Reilly Media, 368.
- Yeung, D.-Y., Chow, C. (2002). Parzen-window network intrusion detectors. Object recognition supported by user interaction for service robots. doi: https://doi.org/10.1109/icpr.2002.1047476
- Hodge, V. J., Austin, J. (2004). A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22 (2), 85–126. doi: https://doi.org/10.1007/s10462-004-4304-y
- Agyemang, M., Barker, K., Alhajj, R. (2006). A comprehensive survey of numeric and symbolic outlier mining techniques. Intelligent Data Analysis, 10 (6), 521–538. doi: https://doi.org/10.3233/ida-2006-10604
- Keogh, E., Lin, J., Fu, A. (2005). HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence. Fifth IEEE International Conference on Data Mining (ICDM’05). doi: https://doi.org/10.1109/icdm.2005.79
- Keogh, E., Lin, J., Lee, S.-H., Herle, H. V. (2006). Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 11 (1), 1–27. doi: https://doi.org/10.1007/s10115-006-0034-6
- Donoho, S. (2004). Early detection of insider trading in option markets. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’04. doi: https://doi.org/10.1145/1014052.1014100
- Fu, A. W., Leung, O. T.-W., Keogh, E., Lin, J. (2006). Finding Time Series Discords Based on Haar Transform. Lecture Notes in Computer Science, 31–41. doi: https://doi.org/10.1007/11811305_3
- Yarovyi, A. A., Polhul, T. D. (2015). Pidvyshchennia produktyvnosti obchysliuvalnykh protsesiv v paralelno-ierarkhichniy merezhi za dopomohoiu Framework Benchmark Akka. Zbirnyk tez dopovidi VII Mizhnarodnoi naukovo-tekhnichnoi konferentsiyi «Fotonika ODS-2015». Vinnytsia, 9.
- Baudat, G., Anouar, F. (2000). Generalized Discriminant Analysis Using a Kernel Approach. Neural Computation, 12 (10), 2385–2404. doi: https://doi.org/10.1162/089976600300014980
- Yarovyi, A. A., Polhul, T. D. (2018). Kompiuterna prohrama «Prohramnyi modul zboru danykh informatsiynoi tekhnolohiyi» vyiavlennia shakhraistva pry instaliuvanni prohramnykh dodatkiv. Cvidotstvo pro reiestratsiu avtorskoho prava na tvir No. 76348. Kyiv: Ministerstvo ekonomichnoho rozvytku i torhivli Ukrainy.
- Yarovyi, A. A., Polhul, T. D. (2018). Kompiuterna prohrama «Prohramnyi modul vyznachennia skhozhosti korystuvachiv informatsiynoi tekhnolohiyi vyiavlennia shakhraistva pry instaliuvanni prohramnykh dodatkiv». Cvidotstvo pro reiestratsiu avtorskoho prava na tvir No. 76347. Kyiv: Ministerstvo ekonomichnoho rozvytku i torhivli Ukrainy.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 Tetiana Polhul, Andrii Yarovyi
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.