Architecture of an automated program complex based on a multiple kernel svm classifier for analyzing malicious executable files

Authors

DOI:

https://doi.org/10.30837/2522-9818.2024.29.039

Keywords:

cybersecurity; malware detection; automated program complex; static analysis; dynamic analysis; drakvuf; IDA Pro; multiple kernel.

Abstract

Subject matter. This article presents the development and architecture of an automated program complex designed to identify and analyze malicious executable files using a classifier based on a multiple kernel support vector machine (SVM). Goal. The aim of the work is to create an automated system that enhances the accuracy and efficiency of malware detection by combining static and dynamic analysis into a single framework capable of processing large volumes of data with optimal time expenditure. Tasks. To achieve this goal, tasks were carried out that included developing a program complex that automates the collection of static and dynamic data from executable files using tools like IDA Pro, IDAPython, and Drakvuf; integrating a multiple kernel SVM classifier to analyze the collected heterogeneous data; validating the system's effectiveness based on a substantial dataset containing 1,389 executable samples; and demonstrating the system's scalability and practical applicability in real-world conditions. Methods. The methods involved a hybrid approach that combines static analysis – extracting byte code, disassembled instructions, and control flow graphs using IDA Pro and IDAPython – with dynamic analysis, which entails monitoring real-time behavior using Drakvuf. The multiple kernel SVM classifier integrates different data representations using various kernels, allowing for both linear and nonlinear relationships to be considered in the classification process. Results. The results of the study show that the system achieves a high level of accuracy and completeness, as evidenced by key performance metrics such as an F-score of 0.93 and ROC AUC and PR AUC values. The automated program complex reduces the analysis time of a single file from an average of 11 minutes to approximately 5 minutes, effectively doubling the throughput compared to previous methods. This significant reduction in processing time is critically important for deployment in environments where rapid and accurate malware detection is necessary. Furthermore, the system's scalability allows for efficient processing of large data volumes, making it suitable for real-world applications. Conclusions. In conclusion, the automated program complex developed in this study demonstrates significant improvements in the accuracy and efficiency of malware detection. By integrating multiple kernel SVM classification with static and dynamic analysis, the system shows potential for real-time malware detection and analysis. Its scalability and practical applicability indicate that it could become an important tool in combating modern cyber threats, providing organizations with an effective means to enhance their cybersecurity.

Author Biographies

Alan Nafiiev, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Institute of Physics and Technology, PhD student

Andrii Rodionov, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

PhD (Engineering Sciences), Associate Professor, Institute of Physics and Technology, Kyiv, Ukraine

References

References

Raff, E., et al. (2018), "Malware Detection by Eating a Whole EXE." Workshop on Binary Analysis Research (BAR).

Santos, I., et al. (2013), "Opcode Sequences as Representation of Executables for Data-Mining-Based Unknown Malware Detection." Information Sciences, vol. 231, pp. 64–82.

Tu, K., Li, J., Towsley, D. and Braines, D. (2019), "gl2vec: Learning feature representation using graphlets for directed networks", Proceedings of the 2019 Workshop on Binary Analysis Research. DOI: 10.1145/3341161.3342908

Aziz, F., Ullah, A. and Shah, F. (2020), "Feature selection and learning for graphlet kernel", Pattern Recognition Letters, 140, pp. 45–51. DOI: 10.1016/j.patrec.2020.05.019

Paakkola, S. (2020), "Assessing performance overhead of Virtual Machine Introspection and its suitability for malware analysis", University of Turku. Available at: https://core.ac.uk/download/pdf/347180664.pdf

Khater, I.M., Meng, F., Nabi, I.R. and Hamarneh, G. (2019), "Identification of caveolin-1 domain signatures via machine learning and graphlet analysis of single-molecule super-resolution data", Bioinformatics, 35(18), pp. 3468–3474. DOI: 10.1093/bioinformatics/btz951

Nafiiev Alan, Kholodulkin Hlib, Rodionov Andrii, (2021) "Comparative analysis of machine learning methods for detecting malicious files". Theoretical and Applied Cybersecurity, Vol. 3 No. 1, pp 46–51.

Alan Nafiiev, Hlib Kholodulkin, Andrii Rodionov, (2022), "Malware dynamic analysis system based on virtual machine introspection and machine learning methods", Information Technologies and Security. Proceedings of the XXII International Scientific and Practical Conference ITB-2022. Issue 22: pp 53–58.

Nafiiev Alan, Lande Dmytro, (2023), "Malware detection model based on machine learning". Bulletin of Cherkasy State Technological University, No. 3, pp. 40–50.

Nafiiev Alan, Rodionov Andrii, (2023), "Malware detection system based on static and dynamic analysis using machine learning", Theoretical and Applied Cybersecurity, Vol. 5 No. 2, pp. 97–104.

Rizvi, S.K.J., Aslam, W., Shahzad, M., Saleem, S. (2022), "PROUD-MAL: static analysis-based progressive framework for deep unsupervised malware classification of windows portable executable", Complex & Intelligent Systems, 8(1), pp. 1345–1361. DOI: 10.1007/s40747-021-00560-1

Faloutsos, M. (2019), "IDAPro for IoT Malware analysis?", Workshop on Binary Analysis Research (BAR), Available at: https://escholarship.org/content/qt4rp172kk/qt4rp172kk.pdf

Chen, Z., Brophy, E., Ward, T. (2021), "Malware classification using static disassembly and machine learning", arXiv preprint arXiv:2201.07649.

Talukder, S. (2020), "Tools and techniques for malware detection and analysis", arXiv preprint arXiv:2002.06819, Available at: https://www.researchgate.net/publication/339301928_Tools_and_Techniques_for_Malware_Detection_and_Analysis

Aziz, F., Ullah, A. and Shah, F. (2020), "Feature selection and learning for graphlet kernel", Pattern Recognition Letters, 140, pp. 45–51. DOI: 10.1016/j.patrec.2020.05.019

Singh, S. (2023), "DRAKVUF Malware Sandbox", World Forum on Engineering and Science, 5(1), pp. 23–30. DOI: 10.5281/zenodo.5544337

Dietz, C., Antzek, M., Dreo, G., Sperotto, A. (2022), "Dmef: Dynamic malware evaluation framework", International Journal of Information Security, 21(1), pp. 67–85. DOI: 10.1007/s10207-021-00554-1

Sidey-Gibbons, J.A.M. and Sidey-Gibbons, C.J. (2019), "Machine learning in medicine: a practical introduction", BMC Medical Research Methodology, 19(1). DOI: 10.1186/s12874-019-0681-4

Starink, J.A.L. (2021), "Analysis and automated detection of host-based code injection techniques in malware", Journal of Computer Virology and Hacking Techniques, 17(1), pp. 1–12. DOI: 10.1007/s11416-020-00356-0

Leszczyński, M. and Stopczański, K. (2020), "A new open-source hypervisor-level malware monitoring and extraction system-current state and further challenges", Virus Bulletin 2020, Available at: https://vblocalhost.com/uploads/VB2020-Leszczynski-Stopczanski.pdf (Accessed: 14 July 2024).

Downloads

Published

2024-09-30

How to Cite

Nafiiev, A., & Rodionov, A. (2024). Architecture of an automated program complex based on a multiple kernel svm classifier for analyzing malicious executable files. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (3 (29), 39–47. https://doi.org/10.30837/2522-9818.2024.29.039