End-to-end deep reinforcement learning for control of an autonomous underwater robot with an undulating propulsor

Ahmad Aws
Bauman Moscow State Technical University (BMSTU), Robotic Systems and Mechatronics Department, Postgraduate Student, 5-1, 2-ya Baumanskaya ul., Moscow, 105005, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.

Arkadij S. Yuschenko
Doctor of Technical Science, Professor, (BMSTU), Deputy Head of Chair, 5-1, 2-ya Baumanskaya ul., Moscow, 105005, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.

Vladimir I. Soloviev
Doctor of Economics, Professor, Limited Liability Company «Center for Intelligent Analytical and Robotic Systems» (LLC «CIARS»), General Director, 31, Tcentralnaya ul., Balashikha, 143914, Russia; Moscow Technical University of Communications and Informatics, Head of Chair of Applied Artificial Intelligence, This email address is being protected from spambots. You need JavaScript enabled to view it.

Received September 12, 2023

Abstract
This paper focuses on the development and implementation of control algorithms for positioning an Autonomous Underwater Vehicle (AUV) with an undulating propulsor, using reinforcement learning methods. It provides an analysis and overview of works incorporating reinforcement learning methods such as Actor-only, Critic-only, and Actor-Critic. The paper primarily focuses on the Deep Deterministic Policy Gradient method and its implementation using deep neural networks to train the Actor-Critic agent. In the agent's architecture, a replay buffer and target neural networks were utilized to address the data correlation issue that induces training instability. An adaptive architecture was proposed for training the agent to force the robot to move from the initial point to any target point. Additionally, a random target point generator was incorporated at the training stage so as not to retrain the agent when the target points change. The training objective is to optimize the actor's policy by optimizing the critic and maximizing the reward function. Reward function is determined as the distance from the robot's center of mass to the target points. Consequently, the reward received by the agent increases when the robot gets closer to the target point and becomes maximal when the target point is reached with an acceptable error.

Key words
Autonomous Underwater Vehicle (AUV), end-to-end reinforcement learning algorithm, Reward Function, Replay Buffer, Undulating Motion.

DOI
10.31776/RTCJ.12105

Bibliographic description
Aws Ahmad, Yuschenko, A.S. and Soloviev, V.I. (2024), "End-to-end deep reinforcement learning for control of an autonomous underwater robot with an undulating propulsor", Robotics and Technical Cybernetics, vol. 12, no. 1, pp. 36-45, DOI: 10.31776/RTCJ.12105. (in Russian).

UDC identifier
004.89:004.4:629.58

References

Yang, H. et al. (2021), “Research on underwater object recognition based on YOLOv3”, Microsystem Technologies, vol. 27, pp.1837-1844.
Jin, L. and Liang, H. (2017), “Deep learning for underwater image recognition in small sample size situations”, OCEANS 2017-Aberdeen, IEEE, pp. 1-4.
Mnih, V. et al. (2015), “Human-level control through deep reinforcement learning”, Nature, vol. 518, no. 7540, pp. 529-533.
Sutton, R.S. and Barto, A.G. (2018), Reinforcement learning: An introduction, MIT press.
Yael Niv (2017), “Reinforcement learning in the brain,” Journal of Mathematical Psychology, vol. 53 (43), pp. 92.
Li, Y. (2017), “Deep reinforcement learning: An overview”, arXiv preprint arXiv:1701.07274.
Gaskett, C., Wettergreen, D., and Zelinsky, A. (1999), “Reinforcement learning applied to the control of an autonomous underwater vehicle,” in Proc. Of the Australian Conference on Robotics and Automation (AUCRA99), pp. 125–131.
Liu, B. and Lu, Z. (2013), “Auv path planning under ocean current based on reinforcement learning in electronic chart”, 2013 International Conference on Computational and Information Sciences, IEEE, pp. 1939-1942.
Sun, Y. et al. (2020), “AUV path following controlled by modified Deep Deterministic Policy Gradient”, Ocean Engineering, vol. 210. pp. 107360.
Grondman, I. et al. (2012), “A survey of actor-critic reinforcement learning: Standard and natural policy gradients”, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 6, pp. 1291-1307.
Schulman, J. et al. (2017), “Proximal policy optimization algorithms”, arXiv preprint arXiv:1707.06347.
Han, S. et al. (2021), “Regularly updated deterministic policy gradient algorithm”, Knowledge-Based Systems, 214, pp. 106736.
Silver, D. et al. (2014), “Deterministic policy gradient algorithms”, International conference on machine learning, Pmlr, pp. 387-395.
Bengio, Y. et al. (2009), “Continuous control with deep reinforcement learning”, Trends® Mach. Learn., vol. 2, pp. 1-127.
Ahmad, Aws, Wassouf, Y., Konovalov, K.V. and Yushchenko, A.S. (2022), “Study of an underwater robot with a wave-like propulsion device”, Mekhatronika, Avtomatizatsiya, Upravlenie, vol. 23, no. 11, pp. 607–616.
Ahmad, Aws, and Yushchenko, A.S. (2022), “Dynamic model of an underwater mobile robot with undulating propulsors”, in Proceedings of International Scientific and Technological Conference “Extreme Robotics”, pp. 243-252.