...
首页> 外文期刊>IFAC PapersOnLine >Intelligent Autonomous Navigation of Car-Like Unmanned Ground Vehicle via Deep Reinforcement Learning
【24h】

Intelligent Autonomous Navigation of Car-Like Unmanned Ground Vehicle via Deep Reinforcement Learning

机译:通过深度加强学习,智能自主导航汽车样无人机的地面车辆

获取原文
           

摘要

In this paper, a car-like Unmanned Ground Vehicle (UGV) is simulated and trained as an intelligent agent to navigate and exit unknown obstacle filled environments given no prior knowledge of environment characteristics, using a Reinforcement Learning (RL) algorithm tailored for continuous action spaces. This is achieved using Deep Deterministic Policy Gradient (DDPG), an Actor-Critic network that combines multiple cutting-edge Artificial Intelligence methods including continuous Deep-Q learning, policy gradient methods and actor-critic networks. A combination of two feedforward neural networks with Rectified Linear Units (ReLU) is used for the critic and actor representations which combine both policy and value based methods to learn continuous action space policies via approximation functions. The role of the actor network in this architecture is to decide linear and angular velocity outputs from a continuous action space given current state inputs, to then be evaluated by the critic network to learn and estimate Q-values by minimizing a loss function. The proposed DDPG RL network is trained and evaluated in two obstacle filled environments for a car-like UGV with wheelbase, l of 0.3 m. During the 10,000 episode training period, the agent converges to a maximum reward value of 180 after 1100 training episodes in the first environment, and a maximum reward value of 80 after 7500 training episodes in the second, more complex environment. The agent is shown to exhibit intelligent human-like learning behavior to learn optimal policies and adapt to new environments at the end of each training period with no changes to network architecture.
机译:在本文中,模拟和培训汽车状无人机(UGV)作为智能代理,以导航和退出未知的环境特性知识,使用针对连续动作的增强学习(RL)算法不知道环境特征的未知障碍物填充环境空间。这是使用深度确定性政策梯度(DDPG),一个参与者批评网络,该批评网络结合多个尖端人工智能方法,包括连续深Q学习,政策梯度方法和演员 - 批评网络。两个具有整流线性单元(Relu)的两种前馈神经网络的组合用于批评者和演员表示,该批评策略和基于策略和价值的方法通过近似函数来学习连续动作空间策略。该架构中的演员网络的作用是决定从给定当前状态输入的连续动作空间的线性和角速度输出,然后通过批评网络评估来通过最小化损耗函数来学习和估计Q值。所提出的DDPG RL网络培训并在两个障碍填充环境中进行培训,用于有轴距的汽车样UGV,L为0.3米。在10,000张集培训期间,在第一个环境中1100次训练集之后,代理人会聚到180的最大奖励价值,以及在第二个更复杂的环境中7500次训练发作后的最大奖励价值80。该代理显示展示智能人称的学习行为,以学习最佳策略,并在每个培训期结束时适应新环境,没有对网络架构的变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号