首页> 外文会议>International Conference on Automatic Control and Artificial Intelligence >Natural gradient based reinforcement learning algorithm using active stimulating
【24h】

Natural gradient based reinforcement learning algorithm using active stimulating

机译:基于自然梯度的主动激励强化学习算法

获取原文

摘要

Episodic Natural Actor-Critic (eNAC) algorithm is an important direct policy search algorithm which can guarantee the unbiasedness of the natural gradient estimate and have good learning result theoretically. But it has a major drawback: the system reset assumption. A novel algorithm, active stimulating based eNAC (AS-eNAC) algorithm, is proposed to release this restrictive assumption. AS-eNAC algorithm is an extension of eNAC algorithm by introducing an active stimulating procedure into the interaction process to generate the informative episodes automatically. As the initial state of the generated episodes may be different, which violates the prerequisite of the natural gradient estimate method of eNAC algorithm, a linear approximator of the initial state value function is employed in the natural gradient estimate process to improve the accuracy of the estimated natural gradient. Simulation results of the cart-pole balancing demonstrate the efficiency of the proposed algorithm.
机译:情景自然Actor-Critic(eNAC)算法是一种重要的直接策略搜索算法,可以保证自然梯度估计的无偏性,并且在理论上具有良好的学习效果。但是它有一个主要缺点:系统重置假设。提出了一种新颖的基于主动刺激的eNAC算法(AS-eNAC),以解除这一限制性假设。 AS-eNAC算法是eNAC算法的扩展,它通过在交互过程中引入主动刺激过程来自动生成信息事件。由于生成的情节的初始状态可能不同,这违反了eNAC算法的自然梯度估计方法的先决条件,因此在自然梯度估计过程中采用初始状态值函数的线性近似器可以提高估计的准确性自然渐变。磁极平衡的仿真结果证明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号