Devising a method for detecting information threats in the Ukrainian cyber space based on machine learning
DOI:
https://doi.org/10.15587/1729-4061.2024.317456Keywords:
information threat, fake news, machine learning, disinformation detection, dataset, cyber securityAbstract
The object of this study is a disinformation detection process based on search algorithms for identifying fake news. The main task was to define a set of criteria and parameters for detecting the Ukrainian-language disinformation based on machine learning. A methodology has been considered for developing and filling a dataset of fakes for further training of the model and testing it for the purpose of identifying disinformation and propaganda, as well as determining the attributes of primary sources and routes of their distribution. This makes it possible to reasonably approach the definition of a model for forecasting the development of information threats in the cyberspace of Ukraine. In particular, the accuracy of automatic detection of the probability of disinformation in texts can be increased. For the English-language texts using balanced datasets for training when applying classical machine learning classifiers, the accuracy of identification and recognition of fakes is ³90 %, and for the Ukrainian-language texts – ³52 % and £90 %. That has made it possible to devise requirements for the structure and content of a typical dataset of fakes in the period after the full-scale invasion of Ukraine. The practical result of this work is the designed decision-making support system for monitoring, detecting, recognizing, and forecasting information threats in the cyberspace of Ukraine based on NLP and machine learning. The implementation of preliminary processing of the Ukrainian-language news, taking into account the linguistic features of the language in the text, increases the accuracy of fake identification by »1.72 times. Approaches to the construction of models for forecasting the development of information threats in cyberspace have been developed, which is an urgent task when fake news and information manipulation can affect public sentiment, politics, and the economy
References
- Trofymenko, O. H. (2019). Monitorynh stanu kiberbezpeky v Ukraini. Pravove zhyttia suchasnoi Ukrainy. Mizhnar. nauk.-prakt. konf. Vol. 1. Odesa: VD «Helvetyka», 642–646. Available at: https://dspace.onua.edu.ua/items/3aa8c85a-0013-4a36-9c74-bedbcd915593
- Trofymenko, O., Prokop, Y., Loginova, N., Zadereyko, O. (2019). Cybersecurity of Ukraine: analysis of the current situation. Ukrainian Information Security Research Journal, 21 (3). https://doi.org/10.18372/2410-7840.21.13951
- Yashchuk, V. I. (2024). Rol ta mistse stratehiyi kiberbezpeky ukrainy u zabezpechenni informatsiynoi bezpeky derzhavy. Available at: https://sci.ldubgd.edu.ua/jspui/bitstream/123456789/13824/1/1%20Yashchuk_monogr_rozdil13.pdf
- Deiaki pytannia reahuvannia subiektamy zabezpechennia kiberbezpeky na rizni vydy podiy u kiberprostori (2023). Postanova Kabinetu Ministriv Ukrainy vid 04.04.23 r. No. 299. Available at: https://zakon.rada.gov.ua/laws/show/299-2023-п#Text
- Vysotska, V., Chyrun, L., Chyrun, S., Holets, I. (2024). Information technology for identifying disinformation sources and inauthentic chat users' behaviours based on machine learning. CEUR Workshop Proceedings, 3723, 427–465. Available at: https://ceur-ws.org/Vol-3723/paper24.pdf
- Vysotska, V., Przystupa, K., Chyrun, L., Vladov, S., Ushenko, Y., Uhryn, D., Hu, Z. (2024). Disinformation, Fakes and Propaganda Identifying Methods in Online Messages Based on NLP and Machine Learning Methods. International Journal of Computer Network and Information Security, 16 (5), 57–85. https://doi.org/10.5815/ijcnis.2024.05.06
- Khairova, N., Galassi, A., Lo, F., Ivasiuk, B., Redozub, I. (2024). Unsupervised approach for misinformation detection in Russia-Ukraine war news. Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems. Volume IV: Computational Linguistics Workshop. https://doi.org/10.31110/colins/2024-4/003
- Wierzbicki, A., Shupta, A., Barmak, O. (2024). Synthesis of model features for fake news detection using large language models. Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems. Volume IV: Computational Linguistics Workshop. https://doi.org/10.31110/colins/2024-4/005
- Oliinyk, V.-A., Vysotska, V., Burov, Ye., Mykich, K., Basto-Fernandes, V. (2020). Propaganda Detection in Text Data Based on NLP and Machine Learning. CEUR workshop proceedings, 2631, 132–144. Available at: https://ceur-ws.org/Vol-2631/paper10.pdf
- Vysotska, V., Mazepa, S., Chyrun, L., Brodyak, O., Shakleina, I., Schuchmann, V. (2022). NLP Tool for Extracting Relevant Information from Criminal Reports or Fakes/Propaganda Content. 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), 93–98. https://doi.org/10.1109/csit56902.2022.10000563
- Dar, R. A., Hashmy, Dr. R. (2023). A Survey on COVID-19 related Fake News Detection using Machine Learning Models. CEUR Workshop Proceedings, 3426, 36–46. Available at: https://ceur-ws.org/Vol-3426/paper4.pdf
- Mykytiuk, A., Vysotska, V., Markiv, O., Chyrun, L., Pelekh, Y. (2023). Technology of Fake News Recognition Based on Machine Learning Methods. CEUR Workshop Proceedings, 3387, 311–330. Available at: https://ceur-ws.org/Vol-3387/paper24.pdf
- Afanasieva, I., Golian, N., Golian, V., Khovrat, A., Onyshchenko, K. (2023). Application of Neural Networks to Identify of Fake News. CEUR Workshop Proceedings, 3396, 346–358. Available at: https://ceur-ws.org/Vol-3396/paper28.pdf
- Shupta, A., Barmak, O., Wierzbicki, A., Skrypnyk, T. (2023). An Adaptive Approach to Detecting Fake News Based on Generalized Text Features. CEUR Workshop Proceedings, 3387, 300–310. Available at: https://ceur-ws.org/Vol-3387/paper23.pdf
- Saquete, E., Tomás, D., Moreda, P., Martínez-Barco, P., Palomar, M. (2020). Fighting post-truth using natural language processing: A review and open challenges. Expert Systems with Applications, 141, 112943. https://doi.org/10.1016/j.eswa.2019.112943
- Elzayady, H., Mohamed, M. S., Badran, K. M., Salama, G. I. (2022). Detecting Arabic textual threats in social media using artificial intelligence: An overview. Indonesian Journal of Electrical Engineering and Computer Science, 25 (3), 1712–1722. http://doi.org/10.11591/ijeecs.v25.i3.pp1712-1722
- Shahbazi, Z., Byun, Y.-C. (2021). Fake Media Detection Based on Natural Language Processing and Blockchain Approaches. IEEE Access, 9, 128442–128453. https://doi.org/10.1109/access.2021.3112607
- Guo, Z., Schlichtkrull, M., Vlachos, A. (2022). A Survey on Automated Fact-Checking. Transactions of the Association for Computational Linguistics, 10, 178–206. https://doi.org/10.1162/tacl_a_00454
- Liu, X., Qi, L., Wang, L., Metzger, M. J. (2023). Checking the Fact-Checkers: The Role of Source Type, Perceived Credibility, and Individual Differences in Fact-Checking Effectiveness. Communication Research. https://doi.org/10.1177/00936502231206419
- Martín, A., Huertas-Tato, J., Huertas-García, Á., Villar-Rodríguez, G., Camacho, D. (2022). FacTeR-Check: Semi-automated fact-checking through semantic similarity and natural language inference. Knowledge-Based Systems, 251, 109265. https://doi.org/10.1016/j.knosys.2022.109265
- Ali, F., El-Sappagh, S., Islam, S. M. R., Ali, A., Attique, M., Imran, M., Kwak, K.-S. (2021). An intelligent healthcare monitoring framework using wearable sensors and social networking data. Future Generation Computer Systems, 114, 23–43. https://doi.org/10.1016/j.future.2020.07.047
- Camacho, D., Panizo-LLedot, Á., Bello-Orgaz, G., Gonzalez-Pardo, A., Cambria, E. (2020). The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion, 63, 88–120. https://doi.org/10.1016/j.inffus.2020.05.009
- Daud, N. N., Ab Hamid, S. H., Saadoon, M., Sahran, F., Anuar, N. B. (2020). Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 166, 102716. https://doi.org/10.1016/j.jnca.2020.102716
- Chen, Q., Srivastava, G., Parizi, R. M., Aloqaily, M., Ridhawi, I. A. (2020). An incentive-aware blockchain-based solution for internet of fake media things. Information Processing & Management, 57 (6), 102370. https://doi.org/10.1016/j.ipm.2020.102370
- Avelino, M., Rocha, A. A. de A. (2022). BlockProof: A Framework for Verifying Authenticity and Integrity of Web Content. Sensors, 22 (3), 1165. https://doi.org/10.3390/s22031165
- Wang, X., Xie, H., Ji, S., Liu, L., Huang, D. (2023). Blockchain-based fake news traceability and verification mechanism. Heliyon, 9 (7), e17084. https://doi.org/10.1016/j.heliyon.2023.e17084
- Boyen, X., Herath, U., McKague, M., Stebila, D. (2021). Associative Blockchain for Decentralized PKI Transparency. Cryptography, 5 (2), 14. https://doi.org/10.3390/cryptography5020014
- Xue, J., Wang, Y., Tian, Y., Li, Y., Shi, L., Wei, L. (2021). Detecting fake news by exploring the consistency of multimodal data. Information Processing & Management, 58 (5), 102610. https://doi.org/10.1016/j.ipm.2021.102610
- Sahoo, S. R., Gupta, B. B. (2021). Multiple features based approach for automatic fake news detection on social networks using deep learning. Applied Soft Computing, 100, 106983. https://doi.org/10.1016/j.asoc.2020.106983
- Kaliyar, R. K., Goswami, A., Narang, P., Sinha, S. (2020). FNDNet – A deep convolutional neural network for fake news detection. Cognitive Systems Research, 61, 32–44. https://doi.org/10.1016/j.cogsys.2019.12.005
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Victoria Vysotska, Mariia Nazarkevych, Serhii Vladov, Olga Lozynska, Oksana Markiv, Roman Romanchuk, Vitalii Danylyk
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.