Ahmad Aws
Bauman Moscow State Technical University (BMSTU), Robotic Systems and Mechatronics Department, Postgraduate Student, 5-1, 2-ya Baumanskaya ul., Moscow, 105005, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.
Arkadij S. Yuschenko
Doctor of Technical Science, Professor, (BMSTU), Deputy Head of Chair, 5-1, 2-ya Baumanskaya ul., Moscow, 105005, Russia, This email address is being protected from spambots. You need JavaScript enabled to view it.
Vladimir I. Soloviev
Doctor of Economics, Professor, Limited Liability Company «Center for Intelligent Analytical and Robotic Systems» (LLC «CIARS»), General Director, 31, Tcentralnaya ul., Balashikha, 143914, Russia; Moscow Technical University of Communications and Informatics, Head of Chair of Applied Artificial Intelligence, This email address is being protected from spambots. You need JavaScript enabled to view it.
Received September 12, 2023
Abstract
This paper focuses on the development and implementation of control algorithms for positioning an Autonomous Underwater Vehicle (AUV) with an undulating propulsor, using reinforcement learning methods. It provides an analysis and overview of works incorporating reinforcement learning methods such as Actor-only, Critic-only, and Actor-Critic. The paper primarily focuses on the Deep Deterministic Policy Gradient method and its implementation using deep neural networks to train the Actor-Critic agent. In the agent's architecture, a replay buffer and target neural networks were utilized to address the data correlation issue that induces training instability. An adaptive architecture was proposed for training the agent to force the robot to move from the initial point to any target point. Additionally, a random target point generator was incorporated at the training stage so as not to retrain the agent when the target points change. The training objective is to optimize the actor's policy by optimizing the critic and maximizing the reward function. Reward function is determined as the distance from the robot's center of mass to the target points. Consequently, the reward received by the agent increases when the robot gets closer to the target point and becomes maximal when the target point is reached with an acceptable error.
Key words
Autonomous Underwater Vehicle (AUV), end-to-end reinforcement learning algorithm, Reward Function, Replay Buffer, Undulating Motion.
DOI
10.31776/RTCJ.12105
Bibliographic description
Aws Ahmad, Yuschenko, A.S. and Soloviev, V.I. (2024), "End-to-end deep reinforcement learning for control of an autonomous underwater robot with an undulating propulsor", Robotics and Technical Cybernetics, vol. 12, no. 1, pp. 36-45, DOI: 10.31776/RTCJ.12105. (in Russian).
UDC identifier
004.89:004.4:629.58
References