A modified method of self-recovery of distributed software in heterogeneous computer systems
DOI:
https://doi.org/10.30837/ITSSI.2024.27.005Keywords:
self-healing methods; software; distributed computing; computer systems; cloud architectures; software agents.Abstract
The object of research is the distributed computing process in heterogeneous computer systems. The subject of the research is methods of self-healing for distributed software on heterogeneous computer systems. The goal is to increase the efficiency of distributed data processing systems with support for the functional stability of the computing process by developing a modified method of self-healing of distributed software. Tasks: to investigate the existing methods of restoring the distributed computing process, to draw conclusions about their advantages and disadvantages; on the basis of mathematical models of tasks, computing resources and existing methods of resource allocation, develop a modification of the method of self-recovery of distributed software taking into account management strategies, finding the best solution for the selected criteria, reducing energy consumption during the execution of tasks; conduct a number of experiments comparing the developed method with existing ones. Research methods are based on the use of set theory, general systems theory, and simulation modeling theory. The results of the experiments obtained during the simulation of the allocation of software tasks to computing resources in a simulated simulation environment and the simulation of the computing process during self-recovery in case of resource failures confirm the effectiveness of the proposed method. Conclusion: the application of the method in distributed computing control systems does not increase the time the system spends on performing the task in the absence of failures, at the same time, in the presence of failures, it allows to restore the functionality of the software task faster and reduces the execution time by 8–17%, and energy consumption by 7–12%. There is also an increase in efficiency with an increase in the size of the tasks and the probability of failures. The development of technologies for automated or automatic use of methods of resource allocation and self-recovery can be indicated as areas for future research.
References
Список літератури
Kumar R., Singla S. A Study of Bug Manifestion Process for Ensuring Software Quality. 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT). 8–19 June 2021. P.801–804. DOI: 10.1109/CSNT51715.2021.9509676
REPT: Reverse debugging of failures in deployed software / W. Cui et al. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, October 2018. P. 17–32. URL: https://www.usenix.org/system/files/osdi18-cui.pdf
Hoshino S., Arahori Y., Gondow K. Postmortem accurate IR-level state recovery for deployed concurrent programs. ACM SIGAPP Applied Computing Review. Vol. 2021:3. P. 33–48. DOI: https://doi.org/10.1145/3493499.3493502
Thakkar A., Lohiya, R. A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif Intell. 2022. Rev. 55. P. 453–563. DOI: https://doi.org/10.1007/s10462-021-10037-9
Yihunie F., Abdelfattah E., Regmi A. Applying machine learning to anomaly-based intrusion detection systems. In: 2019 IEEE Long Island systems, applications and technology conference (LISAT). IEEE, 2019. P. 1–5. DOI: 10.1109/LISAT.2019.8817340
Wressnegger C., Kellner A. and Rieck K. ZOE: Content-Based Anomaly Detection for Industrial Control Systems. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2018. P. 27–138. DOI: 10.1109/DSN.2018.00025.978-1-5386-5596-2
Intrusion Detection and Identification Using Tree-Based Machine Learning Algorithms on DCS Network in the Oil Refinery / K. Ho Kim et al., IEEE Transactions on Power Systems. 2022. Vol.37, No.6. P.4673–4682. DOI: 10.1109/TPWRS.2022.3150084
Song Y., Locasto M. E., Stavrou A. On the Infeasibility of Modeling Polymorphic Shell-code. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 2007). 2007. P. 541–551. DOI: https://doi.org/10.1145/1315245.1315312
Qin F., Tucek J., Sundaresan, J., Zhou. Y. Rx: Treating Bugs as Allergies-A Safe Method to Survive Software Failures. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005). 2005. P. 235–248. DOI: 10.1145/1275517.1275519
Chen C, Eisenhauer G. and Pande S. Near-Zero Downtime Recovery From Transient-Error-Induced Crashes. IEEE Transactions on Parallel and Distributed Systems. 2021. Vol. 33. Issue 5. P. 765–778. DOI: 10.1109/TPDS.2021.3096055
Bhat K., Kouwe E., Bos H. and Giuffrida C. FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan. 2021. P. 363–375. DOI: 10.1109/DSN48987.2021.00048.
Sweeper: A Lightweight End-To-End System for Defending Against Fast Worms / J. Tucek et al. In Proceedings of the 2nd European Conference on Computer Systems (Eu-roSys 2007). 2007. P. 115–128. DOI: 10.1145/1272996.1273010
Verma, S., Roy, S. Debug-localize-repair: a symbiotic construction for heap manipulations. Form Methods Syst Des 58. 2021. P. 399–439. DOI: https://doi.org/10.1007/s10703-021-00387-z
X. Zhao et al. Data backup policies with failure-oblivious computing in reliability theory. Annals of Operations Research. 2022. P. 1–12. DOI: 10.1007/s10479-022-04941-8
Farzadnia E., Shirazi H, Nowroozi A. A novel sophisticated hybrid method for intrusion detection using the artificial immune system. Journal of Information Security and Applications, 2022. Vol. 70. DOI: 10.1016/j.jisa.2020.102721
Рубан І., Волк М., Рісухін М. Метод самовідновлення розподіленого програмного забезпечення
в гетерогенних комп’ютерних системах. Телекомунікаційні та інформаційні технології. 2019. № 3 (64). C. 17–23. DOI: 10.31673/2412-4338.2019.031723
М. Волк та ін. Журналізація стану програм для самовідновлення паралельних програмних систем. Системи управління, навігації та зв’язку. 2023. Випуск 2(72). С.76-82. DOI: 10.26906/SUNZ.2023.2.080
Гора М., Волк М. Моделі управління ресурсами для забезпечення функціональної стійкості процесу розподілених обчислень. Вісник Херсонського національного технічного університету. 2023. No 4(87). C. 244–251. DOI: https://doi.org/10.35546/kntu2078-4481.2023.4.28
Saremi S., Mirjalili S., and Lewis A. Grasshopper Optimization Algorithm. Theory and application. Elsevier, Advances in Engineering Software Journal. 2017. No. 105. P. 30–47. DOI: https://doi.org/10.1016/j.advengsoft.2017.01.004
K. Kulkarni et al. An Inertia Weight Concept-Based salp Swarm Optimization Algorithm. In Proceedings of the 2021 IEEE Madras Section Conference (MASCON), Chennai, India. 27–28 August 2021. P. 1–6. DOI: 10.1109/MASCON51689.2021.9563412
WorkflowSim. URL: https://github.com/WorkflowSim/WorkflowSim-1.0 (дата звернення 06.02.2024)
References
Kumar, R., Singla, S. (2021), "A Study of Bug Manifestion Process for Ensuring Software Quality" 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT). P. 801–804. DOI: 10.1109/CSNT51715.2021.9509676
Cui, W., Ge, X., Kasikci, B., Niu, B., Sharma, U., Wang, R., Yun, I. (2018), "REPT: Reverse debugging of failures in deployed software". In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI. Carlsbad, CA. P. 17–32. available at: https://www.usenix.org/system/files/osdi18-cui.pdf
Hoshino, S., Arahori, Y., Gondow, K. (2021), "Postmortem accurate IR-level state recovery for deployed concurrent programs". ACM SIGAPP Applied Computing Review. Vol. 3. P. 33–48. DOI: https://doi.org/10.1145/3493499.3493502
Thakkar, A., Lohiya, R. (2022), "A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions". Artif Intell. Rev. 55. P. 453–563. DOI: https://doi.org/10.1007/s10462-021-10037-9
Yihunie, F., Abdelfattah, E., Regmi, A. (2019), "Applying machine learning to anomaly-based intrusion detection systems". In: 2019 IEEE Long Island systems, applications and technology conference (LISAT). IEEE. P. 1–5. DOI: 10.1109/LISAT.2019.8817340
Wressnegger, C., Kellner, A. and Rieck, K. (2028), "ZOE: Content-Based Anomaly Detection for Industrial Control Systems." 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). P. 27–138. DOI: 10.1109/DSN.2018.00025.978-1-5386-5596-2
Kim, R, Kwak, B., Han, M., Kim, H. (2022), "Intrusion Detection and Identification Using Tree-Based Machine Learning Algorithms on DCS Network in the Oil Refinery". IEEE Transactions on Power Systems. Vol.37, No.6. P. 4673–4682. DOI: 10.1109/TPWRS.2022.3150084
Song, Y., Locasto, M., Stavrou, A. (2007), "On the Infeasibility of Modeling Polymorphic Shell-code". In Proceedings of the 14th ACM Conference on Computer and Communications Security (CCS 2007). P. 541–551. DOI: https://doi.org/10.1145/1315245.1315312
Qin, F., Tucek, J., Sundaresan, J., Zhou, Y. (2005), "Rx: Treating Bugs As Allergies – A Safe Method To Survive Software Failures". In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP 2005). P. 235–248. DOI: 10.1145/1275517.1275519
Chen, C., Eisenhauer, G., Pande, S. (2021), "Near-Zero Downtime Recovery From Transient-Error-Induced Crashes". IEEE Transactions on Parallel and Distributed Systems. Vol. 33. Issue 5. P. 765–778. DOI: 10.1109/TPDS.2021.3096055
Bhat, K., Kouwe, E., Bos, H., Giuffrida, C. (2021), "FIRestarter: Practical Software Crash Recovery with Targeted Library-level Fault Injection". 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Taipei, Taiwan. P. 363–375. DOI: 10.1109/DSN48987.2021.00048.
Tucek, J., Newsome, J., Lu, S., Huang, C., Xanthos, S., Brumley, D., Zhou, Y., Song., D. (2007), "Sweeper: A Lightweight End-To-End System for Defending Against Fast Worms". In Proceedings of the 2nd European Conference on Computer Systems (Eu-roSys 2007). P. 115–128. DOI: 10.1145/1272996.1273010
Verma, S., Roy, S. (2021), "Debug-localize-repair: a symbiotic construction for heap manipulations". Form Methods syst Des 58. 2021. P. 399–439. DOI: https://doi.org/10.1007/s10703-021-00387-z
Zhao, X., Wang, D., Mizutani, S., Nakagawa, T. (2022), "Data backup policies with failure-oblivious computing in reliability theory". Annals of Operations Research. P. 1–12. DOI: 10.1007/s10479-022-04941-8
Farzadnia, E., Shirazi, H., Nowroozi A. (2019), "A novel sophisticated hybrid method for intrusion detection using the artificial immune system". Journal of Information Security and Applications, 2022. Vol. 70. DOI: 10.1016/j.jisa.2020.102721
Ruban, І., Volk, М., Risukhin, М. (2019), "A method of self-healing of distributed software in heterogeneous computer systems" ["Metod samovidnovlennya rozpodilenoho prohramnoho zabezpechennya v heterohennykh kompʺyuternykh systemakh "]. Telecommunications and information technologies. № 3(64). Р. 17–23. DOI: 10.31673/2412-4338.2019.031723
Volk, М., Hora, М., Labazov, V., Mishenko, А., Barsukiv, A., Goletz, В. (2023), "Journaling of program status for self-healing of parallel software systems" ["Zhurnalizatsiya stanu prohram dlya samovidnovlennya paralelʹnykh prohramnykh system"]. Control, navigation and communication systems. No 2(72). Р. 76–82. DOI: 10.26906/SUNZ.2023.2.080
Hora, М., Volk, М. (2023), "Resource management models to ensure the functional stability of the distributed computing process" ["Modeli upravlinnya resursamy dlya zabezpechennya funktsionalʹnoyi stiykosti protsesu rozpodilenykh obchyslen"]. Bulletin of the Kherson National Technical University. No 4(87). Р. 244- 251. DOI https://doi.org/10.35546/kntu2078-4481.2023.4.28
Saremi, S., Mirjalili, S., Lewis, A. (2017), "Grasshopper Optimization Algorithm". Theory and application. Elsevier, Advances in Engineering Software Journal. No. 105. P. 30–47. DOI: https://doi.org/10.1016/j.advengsoft.2017.01.004
Kulkarni, К. et al. (2021), "An Inertia Weight Concept-Based salp Swarm Optimization Algorithm". In Proceedings of the 2021 IEEE Madras Section Conference (MASCON), Chennai, India. 27–28 August 2021. P. 1–6. DOI: 10.1109/MASCON51689.2021.9563412
WorkflowSim. available at: https://github.com/WorkflowSim/WorkflowSim-1.0 (last accessed 06.02.2024)
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.