首页> 外文会议>IEEE South East Conference >Deep Reinforcement Learning For Visual Navigation of Wheeled Mobile Robots
【24h】

Deep Reinforcement Learning For Visual Navigation of Wheeled Mobile Robots

机译:轮式移动机器人视觉导航的深度强化学习

获取原文

摘要

A study is presented on applying deep reinforcement learning (DRL) for visual navigation of wheeled mobile robots (WMR) in dynamic and unknown environments. Two DRL algorithms, namely, value-learning deep Q-network (DQN) and policy gradient based asynchronous advantage actor critic ($A$ 3C), have been considered. RGB (red, green and blue) and depth images have been used as inputs in implementation of both DRL algorithms to generate control commands for autonomous navigation of WMR in simulation environments. The initial DRL networks were generated and trained progressively in OpenAI Gym Gazebo based simulation environments within robot operating system (ROS) framework for a popular target WMR, Kobuki TurtleBot2. A pre-trained deep neural network ResNet50 was used after further training with regrouped objects commonly found in laboratory setting for target-driven mapless visual navigation of Turlebot2 through DRL. The performance of $A$ 3C with multiple computation threads (4, 6, and 8) was simulated on a desktop. The navigation performance of DQN and $A$ 3C networks, in terms of reward statistics and completion time, was compared in three simulation environments. As expected, $A$ 3C with multiple threads (4, 6, and 8) performed better than DQN and the performance of $A$ 3C improved with number of threads. Details of the methodology, simulation results are presented and recommendations for future work towards real-time implementation through transfer learning of the DRL models are outlined.
机译:提出了一项关于在动态和未知环境中将深度强化学习(DRL)应用于轮式移动机器人(WMR)视觉导航的研究。两种DRL算法,即价值学习深度Q网络(DQN)和基于策略梯度的异步优势参与者评论家( $ A $ 3C),已被考虑。在两种DRL算法的实现中,均使用RGB(红色,绿色和蓝色)和深度图像作为输入,以生成用于在仿真环境中自动导航WMR的控制命令。最初的DRL网络是在基于OpenAI Gym Gazebo的模拟环境(在机器人操作系统(ROS)框架)中为流行的目标WMR Kobuki TurtleBot2生成和逐步训练的。在对实验室环境中常见的重组对象进行进一步训练后,使用经过预训练的深度神经网络ResNet50进行目标驱动的无人驾驶Turlebot2无图视觉导航通过DRL。的表现 $ A $ 在桌面上模拟了具有多个计算线程(4、6和8)的3C。 DQN和 $ A $ 在奖励统计和完成时间方面,在3个模拟环境中对3C网络进行了比较。不出所料 $ A $ 具有多个线程(4、6和8)的3C的性能优于DQN,并且 $ A $ 3C的线程数得到了改善。给出了方法的详细信息,仿真结果,并概述了通过DRL模型的转移学习为实现实时实现而进行的未来工作的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号