Modeling of adaptive UAV route control based on reinforcement learning algorithms

Authors

DOI:

https://doi.org/10.30837/2522-9818.2026.1.028

Keywords:

adaptive control; Proximal Policy Optimization; reinforcement learning; simulation modeling; routing; 3D navigation

Abstract

Subject matter is the reward function, action policy, and learning dynamics of the Proximal Policy Optimization (PPO) algorithm in the task of adaptive UAV navigation under dynamic airspace conditions and limited energy resources. Goal is to develop a simulation environment and a modified Proximal Policy Optimization (PPO) model for adaptive route management of a single UAV in 2D and 3D environments, considering the distance to the target, collision risk, and energy consumption. Tasks: to develop 2D and 3D simulation environments with different obstacle configurations and UAV motion parameters; to formulate a combined PPO reward function that incorporates distance to the target, collisions, and energy consumption; to implement and train PPO, DQN, and A2C algorithms in standardized navigation scenarios; to perform a comparative analysis of algorithm performance using key metrics: path length, number of collisions, reward, and energy consumption; to conduct statistical validation of the results using the t-test and confidence intervals; to analyze the influence of PPO hyperparameters on policy stability and learning convergence in 2D and 3D environments. Methods: deep reinforcement learning algorithms (PPO, DQN, A2C); two simulation models (2D and 3D) with randomly generated static obstacles were developed; a combined reward function was formulated, integrating distance-to-target progress, collision penalties, and an energy-related component; model performance was evaluated using average reward, path length, number of collisions, and total energy expenditure; statistical significance was assessed using the t-test and 95% confidence intervals. Results: The modified PPO model reduced the number of collisions in the 2D environment by 94,8% and shortened the route length by 94,3% compared to the baseline PPO, while exhibiting higher energy consumption due to more complex avoidance maneuvers. In the 3D environment, similar trends were confirmed, including improved navigation safety, more stable policy behavior, and statistically significant improvements across key metrics (p < 0,05). Conclusions: A unified 2D/3D simulation environment for adaptive UAV routing and a modified PPO model with a combined reward function were developed. In the 2D environment, the model achieved a ≈94,8% reduction in collisions, a ≈94,3% reduction in path length, and a ≈92,5% increase in average reward compared to the baseline PPO. In the 3D environment, analogous improvements and statistically significant gains (p < 0,05) were obtained. A relationship between avoidance aggressiveness and energy consumption was identified, enabling selection of an optimal policy for BVLOS scenarios.

Author Biographies

Maksym Yena, National Aerospace University "Kharkiv Aviation Institute"

PhD Student, Department of Information Technology Design

Olha Pohudina, National Aerospace University "Kharkiv Aviation Institute"

Candidate of Technical Sciences, Associate Professor, Department of Information Technology Design

References

References

Debnath, D., Vanegas, F., Sandino, J., Hawary, A. F., Gonzalez, F. (2024), "A review of UAV path-planning algorithms and obstacle avoidance methods for remote sensing applications", Remote Sensing, Vol. 16 (21), 4019 р. DOI: https://doi.org/10.3390/rs16214019

Martins, F. G., Coelho, M. A. N. (2000), "Application of feedforward artificial neural networks to improve process control of PID-based control algorithms", Computers & Chemical Engineering, Vol. 24 (2-7). рр. 853-858. DOI: https://doi.org/10.1016/S0098-1354(00)00339-2

Liu, X., Peng, Z.R., Zhang, L.Y. (2019), "Real-time UAV rerouting for traffic monitoring with decomposition based multi-objective optimization", Journal of Intelligent & Robotic Systems, Vol. 94, рр. 491–501. DOI: https://doi.org/10.1007/s10846-018-0806-8

Almeida, E. N., Campos, R., Ricardo, M. (2022), "Traffic-aware UAV placement using a generalizable deep reinforcement learning methodology", 2022 IEEE Symposium on Computers and Communications (ISCC), рр. 1–6. DOI: https://doi.org/10.48550/arXiv.2203.08924

Madani, A., Engelbrecht, A. Ombuki-Berman, B., (2023), "Cooperative coevolutionary multi-guide particle swarm optimization algorithm for large-scale multi-objective optimization problems", Swarm and Evolutionary Computation. Vol. 82. 101262 р. DOI: https://doi.org/10.1016/j.swevo.2023.101262

Luo, J., Tian, Y., Wang, Z. (2024), "Research on unmanned aerial vehicle path planning". Drones, Vol. 8(2). 51 р. DOI: https://doi.org/10.3390/drones8020051

Li, C., Lian, J., (2007), "The Application of Immune Genetic Algorithm in PID Parameter Optimization for Level Control System", Proceedings of the 2007 IEEE International Conference on Automation and Logistics (ICAL), Jinan, China, рр. 2670–2674. DOI: https://doi.org/10.1109/ICAL.2007.4338670

Yang, F., Lu, Q., Li, R., Xu, Y., Yuan, W., Wu, X. (2023), "Real-time optimal path planning and fast autonomous flight for UAV in unknown environments", IEEE. DOI: https://doi.org/10.23919/CCC58697.2023.10240971

Li, Q., Li, R., Ji, K., Dai, W. (2015), "Kalman filter and its application", IEEE. DOI: https://doi.org/10.1109/ICINIS.2015.35

Hooshyar, M., Huang, Y. (2023), "Meta-heuristic algorithms in UAV path planning optimization: A systematic review (2018–2022)", Drones, Vol. 7(12), 687 р. DOI: https://doi.org/10.3390/drones7120687

Li, H., Zhang, Z.-yu. (2012), "The application of immune genetic algorithm in main steam temperature of PID control of BP network", Physics Procedia, Vol. 25, рр. 80-86. DOI: https://doi.org/10.1016/j.phpro.2012.02.013

Zhang, M., Liu, Y., Wang, Y., Li, F., Chen, L. (2023), "Real-time path planning algorithms for autonomous UAV", IEEE. DOI: https://doi.org/10.1109/CAC57257.2022.10054770

Kim, D. H. (2003), "Comparison of PID controller tuning of power plant using immune and genetic algorithms", The 3rd International Workshop on Scientific Use of Submarine Cables and Related Technologies, Lugano, Switzerland, рр. 358-363. DOI: https://doi.org/10.1109/CIMSA.2003.1227222 14. Yena, M. (2024), "Optimizing air traffic control: Innovative approaches to collision avoidance in UAV operations", Integrated Computer Technologies in Mechanical Engineering - 2023 (ICTM 2023). рр. 543–553. DOI: https://doi.org/10.1007/978-3-031-60549-9_41 15. Yena, M., & Pohudina, O. (2025), "Integrated simulation model of swarm control and adaptive routeing of UAVS in a changing air environment", Innovative technologies and scientific solutions for industries, (4(34), рр. 32-43. DOI: https://doi.org/10.30837/2522-9818.2025.4.032

Downloads

Published

2026-03-30

How to Cite

Yena, M., & Pohudina, O. (2026). Modeling of adaptive UAV route control based on reinforcement learning algorithms. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (1(35), 28–38. https://doi.org/10.30837/2522-9818.2026.1.028