A modified method of self-recovery of distributed software in heterogeneous computer systems

Authors

DOI:

https://doi.org/10.30837/ITSSI.2024.27.005

Keywords:

self-healing methods; software; distributed computing; computer systems; cloud architectures; software agents.

Abstract

The object of research is the distributed computing process in heterogeneous computer systems. The subject of the research is methods of self-healing for distributed software on heterogeneous computer systems. The goal is to increase the efficiency of distributed data processing systems with support for the functional stability of the computing process by developing a modified method of self-healing of distributed software. Tasks: to investigate the existing methods of restoring the distributed computing process, to draw conclusions about their advantages and disadvantages; on the basis of mathematical models of tasks, computing resources and existing methods of resource allocation, develop a modification of the method of self-recovery of distributed software taking into account management strategies, finding the best solution for the selected criteria, reducing energy consumption during the execution of tasks; conduct a number of experiments comparing the developed method with existing ones. Research methods are based on the use of set theory, general systems theory, and simulation modeling theory. The results of the experiments obtained during the simulation of the allocation of software tasks to computing resources in a simulated simulation environment and the simulation of the computing process during self-recovery in case of resource failures confirm the effectiveness of the proposed method. Conclusion: the application of the method in distributed computing control systems does not increase the time the system spends on performing the task in the absence of failures, at the same time, in the presence of failures, it allows to restore the functionality of the software task faster and reduces the execution time by 8–17%, and energy consumption by 7–12%. There is also an increase in efficiency with an increase in the size of the tasks and the probability of failures. The development of technologies for automated or automatic use of methods of resource allocation and self-recovery can be indicated as areas for future research.

Author Biographies

Maksym Volk, Kharkiv National University of Radio Electronics

Doctor of Sciences (Engineering), Professor, Professor at the Department of Electronic Computer

Maksym Hora, Kharkiv National University of Radio Electronics

Postgraduate Student at the Department of Electronic Computers

References

Список літератури

Kumar R., Singla S. A Study of Bug Manifestion Process for Ensuring Software Quality. 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT). 8–19 June 2021. P.801–804. DOI: 10.1109/CSNT51715.2021.9509676

REPT: Reverse debugging of failures in deployed software / W. Cui et al. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, October 2018. P. 17–32. URL: https://www.usenix.org/system/files/osdi18-cui.pdf

Hoshino S., Arahori Y., Gondow K. Postmortem accurate IR-level state recovery for deployed concurrent programs. ACM SIGAPP Applied Computing Review. Vol. 2021:3. P. 33–48. DOI: https://doi.org/10.1145/3493499.3493502

Thakkar A., Lohiya, R. A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif Intell. 2022. Rev. 55. P. 453–563. DOI: https://doi.org/10.1007/s10462-021-10037-9

Yihunie F., Abdelfattah E., Regmi A. Applying machine learning to anomaly-based intrusion detection systems. In: 2019 IEEE Long Island systems, applications and technology conference (LISAT). IEEE, 2019. P. 1–5. DOI: 10.1109/LISAT.2019.8817340

Wressnegger C., Kellner A. and Rieck K. ZOE: Content-Based Anomaly Detection for Industrial Control Systems. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2018. P. 27–138. DOI: 10.1109/DSN.2018.00025.978-1-5386-5596-2

Intrusion Detection and Identification Using Tree-Based Machine Learning Algorithms on DCS Network in the Oil Refinery / K. Ho Kim et al., IEEE Transactions on Power Systems. 2022. Vol.37, No.6. P.4673–4682. DOI: 10.1109/TPWRS.2022.3150084

Song Y., Locasto M. E., Stavrou A. On the Infeasibility of Modeling Polymorphic Shell-code. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 2007). 2007. P. 541–551. DOI: https://doi.org/10.1145/1315245.1315312

Qin F., Tucek J., Sundaresan, J., Zhou. Y. Rx: Treating Bugs as Allergies-A Safe Method to Survive Software Failures. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005). 2005. P. 235–248. DOI: 10.1145/1275517.1275519

Chen C, Eisenhauer G. and Pande S. Near-Zero Downtime Recovery From Transient-Error-Induced Crashes. IEEE Transactions on Parallel and Distributed Systems. 2021. Vol. 33. Issue 5. P. 765–778. DOI: 10.1109/TPDS.2021.3096055

Bhat K., Kouwe E., Bos H. and Giuffrida C. FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan. 2021. P. 363–375. DOI: 10.1109/DSN48987.2021.00048.

Sweeper: A Lightweight End-To-End System for Defending Against Fast Worms / J. Tucek et al. In Proceedings of the 2nd European Conference on Computer Systems (Eu-roSys 2007). 2007. P. 115–128. DOI: 10.1145/1272996.1273010

Verma, S., Roy, S. Debug-localize-repair: a symbiotic construction for heap manipulations. Form Methods Syst Des 58. 2021. P. 399–439. DOI: https://doi.org/10.1007/s10703-021-00387-z

X. Zhao et al. Data backup policies with failure-oblivious computing in reliability theory. Annals of Operations Research. 2022. P. 1–12. DOI: 10.1007/s10479-022-04941-8

Farzadnia E., Shirazi H, Nowroozi A. A novel sophisticated hybrid method for intrusion detection using the artificial immune system. Journal of Information Security and Applications, 2022. Vol. 70. DOI: 10.1016/j.jisa.2020.102721

Рубан І., Волк М., Рісухін М. Метод самовідновлення розподіленого програмного забезпечення

в гетерогенних комп’ютерних системах. Телекомунікаційні та інформаційні технології. 2019. № 3 (64). C. 17–23. DOI: 10.31673/2412-4338.2019.031723

М. Волк та ін. Журналізація стану програм для самовідновлення паралельних програмних систем. Системи управління, навігації та зв’язку. 2023. Випуск 2(72). С.76-82. DOI: 10.26906/SUNZ.2023.2.080

Гора М., Волк М. Моделі управління ресурсами для забезпечення функціональної стійкості процесу розподілених обчислень. Вісник Херсонського національного технічного університету. 2023. No 4(87). C. 244–251. DOI: https://doi.org/10.35546/kntu2078-4481.2023.4.28

Saremi S., Mirjalili S., and Lewis A. Grasshopper Optimization Algorithm. Theory and application. Elsevier, Advances in Engineering Software Journal. 2017. No. 105. P. 30–47. DOI: https://doi.org/10.1016/j.advengsoft.2017.01.004

K. Kulkarni et al. An Inertia Weight Concept-Based salp Swarm Optimization Algorithm. In Proceedings of the 2021 IEEE Madras Section Conference (MASCON), Chennai, India. 27–28 August 2021. P. 1–6. DOI: 10.1109/MASCON51689.2021.9563412

WorkflowSim. URL: https://github.com/WorkflowSim/WorkflowSim-1.0 (дата звернення 06.02.2024)

References

Kumar, R., Singla, S. (2021), "A Study of Bug Manifestion Process for Ensuring Software Quality" 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT). P. 801–804. DOI: 10.1109/CSNT51715.2021.9509676

Cui, W., Ge, X., Kasikci, B., Niu, B., Sharma, U., Wang, R., Yun, I. (2018), "REPT: Reverse debugging of failures in deployed software". In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI. Carlsbad, CA. P. 17–32. available at: https://www.usenix.org/system/files/osdi18-cui.pdf

Hoshino, S., Arahori, Y., Gondow, K. (2021), "Postmortem accurate IR-level state recovery for deployed concurrent programs". ACM SIGAPP Applied Computing Review. Vol. 3. P. 33–48. DOI: https://doi.org/10.1145/3493499.3493502

Thakkar, A., Lohiya, R. (2022), "A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions". Artif Intell. Rev. 55. P. 453–563. DOI: https://doi.org/10.1007/s10462-021-10037-9

Yihunie, F., Abdelfattah, E., Regmi, A. (2019), "Applying machine learning to anomaly-based intrusion detection systems". In: 2019 IEEE Long Island systems, applications and technology conference (LISAT). IEEE. P. 1–5. DOI: 10.1109/LISAT.2019.8817340

Wressnegger, C., Kellner, A. and Rieck, K. (2028), "ZOE: Content-Based Anomaly Detection for Industrial Control Systems." 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). P. 27–138. DOI: 10.1109/DSN.2018.00025.978-1-5386-5596-2

Kim, R, Kwak, B., Han, M., Kim, H. (2022), "Intrusion Detection and Identification Using Tree-Based Machine Learning Algorithms on DCS Network in the Oil Refinery". IEEE Transactions on Power Systems. Vol.37, No.6. P. 4673–4682. DOI: 10.1109/TPWRS.2022.3150084

Song, Y., Locasto, M., Stavrou, A. (2007), "On the Infeasibility of Modeling Polymorphic Shell-code". In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 2007). P. 541–551. DOI: https://doi.org/10.1145/1315245.1315312

Qin, F., Tucek, J., Sundaresan, J., Zhou, Y. (2005), "Rx: Treating Bugs As Allergies – A Safe Method To Survive Software Failures". In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005). P. 235–248. DOI: 10.1145/1275517.1275519

Chen, C., Eisenhauer, G., Pande, S. (2021), "Near-Zero Downtime Recovery From Transient-Error-Induced Crashes". IEEE Transactions on Parallel and Distributed Systems. Vol. 33. Issue 5. P. 765–778. DOI: 10.1109/TPDS.2021.3096055

Bhat, K., Kouwe, E., Bos, H., Giuffrida, C. (2021), "FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection". 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan. P. 363–375. DOI: 10.1109/DSN48987.2021.00048.

Tucek, J., Newsome, J., Lu, S., Huang, C., Xanthos, S., Brumley, D., Zhou, Y., Song., D. (2007), "Sweeper: A Lightweight End-To-End System for Defending Against Fast Worms". In Proceedings of the 2nd European Conference on Computer Systems (Eu-roSys 2007). P. 115–128. DOI: 10.1145/1272996.1273010

Verma, S., Roy, S. (2021), "Debug-localize-repair: a symbiotic construction for heap manipulations". Form Methods syst Des 58. 2021. P. 399–439. DOI: https://doi.org/10.1007/s10703-021-00387-z

Zhao, X., Wang, D., Mizutani, S., Nakagawa, T. (2022), "Data backup policies with failure-oblivious computing in reliability theory". Annals of Operations Research. P. 1–12. DOI: 10.1007/s10479-022-04941-8

Farzadnia, E., Shirazi, H., Nowroozi A. (2019), "A novel sophisticated hybrid method for intrusion detection using the artificial immune system". Journal of Information Security and Applications, 2022. Vol. 70. DOI: 10.1016/j.jisa.2020.102721

Ruban, І., Volk, М., Risukhin, М. (2019), "A method of self-healing of distributed software in heterogeneous computer systems" ["Metod samovidnovlennya rozpodilenoho prohramnoho zabezpechennya v heterohennykh kompʺyuternykh systemakh "]. Telecommunications and information technologies. № 3(64). Р. 17–23. DOI: 10.31673/2412-4338.2019.031723

Volk, М., Hora, М., Labazov, V., Mishenko, А., Barsukiv, A., Goletz, В. (2023), "Journaling of program status for self-healing of parallel software systems" ["Zhurnalizatsiya stanu prohram dlya samovidnovlennya paralelʹnykh prohramnykh system"]. Control, navigation and communication systems. No 2(72). Р. 76–82. DOI: 10.26906/SUNZ.2023.2.080

Hora, М., Volk, М. (2023), "Resource management models to ensure the functional stability of the distributed computing process" ["Modeli upravlinnya resursamy dlya zabezpechennya funktsionalʹnoyi stiykosti protsesu rozpodilenykh obchyslen"]. Bulletin of the Kherson National Technical University. No 4(87). Р. 244- 251. DOI https://doi.org/10.35546/kntu2078-4481.2023.4.28

Saremi, S., Mirjalili, S., Lewis, A. (2017), "Grasshopper Optimization Algorithm". Theory and application. Elsevier, Advances in Engineering Software Journal. No. 105. P. 30–47. DOI: https://doi.org/10.1016/j.advengsoft.2017.01.004

Kulkarni, К. et al. (2021), "An Inertia Weight Concept-Based salp Swarm Optimization Algorithm". In Proceedings of the 2021 IEEE Madras Section Conference (MASCON), Chennai, India. 27–28 August 2021. P. 1–6. DOI: 10.1109/MASCON51689.2021.9563412

WorkflowSim. available at: https://github.com/WorkflowSim/WorkflowSim-1.0 (last accessed 06.02.2024)

Published

2024-07-02

How to Cite

Volk, M., & Hora, M. (2024). A modified method of self-recovery of distributed software in heterogeneous computer systems. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (1 (27), 5–17. https://doi.org/10.30837/ITSSI.2024.27.005