Natural gradient based reinforcement learning algorithm using active stimulating

机译：基于自然梯度的主动激励强化学习算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Episodic Natural Actor-Critic (eNAC) algorithm is an important direct policy search algorithm which can guarantee the unbiasedness of the natural gradient estimate and have good learning result theoretically. But it has a major drawback: the system reset assumption. A novel algorithm, active stimulating based eNAC (AS-eNAC) algorithm, is proposed to release this restrictive assumption. AS-eNAC algorithm is an extension of eNAC algorithm by introducing an active stimulating procedure into the interaction process to generate the informative episodes automatically. As the initial state of the generated episodes may be different, which violates the prerequisite of the natural gradient estimate method of eNAC algorithm, a linear approximator of the initial state value function is employed in the natural gradient estimate process to improve the accuracy of the estimated natural gradient. Simulation results of the cart-pole balancing demonstrate the efficiency of the proposed algorithm.

机译：情景自然Actor-Critic（eNAC）算法是一种重要的直接策略搜索算法，可以保证自然梯度估计的无偏性，并且在理论上具有良好的学习效果。但是它有一个主要缺点：系统重置假设。提出了一种新颖的基于主动刺激的eNAC算法（AS-eNAC），以解除这一限制性假设。 AS-eNAC算法是eNAC算法的扩展，它通过在交互过程中引入主动刺激过程来自动生成信息事件。由于生成的情节的初始状态可能不同，这违反了eNAC算法的自然梯度估计方法的先决条件，因此在自然梯度估计过程中采用初始状态值函数的线性近似器可以提高估计的准确性自然渐变。磁极平衡的仿真结果证明了该算法的有效性。

著录项

来源
《International Conference on Automatic Control and Artificial Intelligence》|2012年|p.1377-1380|共4页
会议地点 Xiamen(CN)
作者
Hao, Chuanchuan; Fang, Zhou; Li, Ping;
展开▼
作者单位

Institute of Industrial Process Control Zhejiang University Hangzhou China 310027;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
active stimulating; cart-pole balancing; natural gradient estimate; reinforcement learning (RL);

机译：积极刺激；车杆平衡自然梯度估计；强化学习（RL）;

相似文献

外文文献
中文文献
专利

1. Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [J] . Vikram Krishnamurthy, George Yin Journal of machine learning research . 2021,第a期

机译：随机梯度算法自适应逆加固学习的Langevin动态
2. Swarm robots reinforcement learning convergence accuracy-based learning classifier systems with gradient descent (XCS-GD) [J] . Jie Shao, Haixia Lin, Kaibian Zhang Neural computing & applications . 2014,第2期

机译：群体机器人强化学习基于梯度下降的基于学习精度的学习分类器系统（XCS-GD）
3. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy [J] . van Rooijen J. C., Grondman I., Babuska R. Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用基于价值梯度的策略进行实时运动控制的无学习率强化学习
4. Natural gradient based reinforcement learning algorithm using active stimulating [C] . Hao Chuanchuan, Fang Zhou, Li Ping International Conference on Automatic Control and Artificial Intelligence . 2013

机译：基于自然梯度的加强学习算法使用主动刺激
5. Pde Approaches to Two Online Learning Problems, and an Empirical Study of Some Neural Network-Based Active Learning Algorithms [D] . Wang, Zhilei. 2021

机译：PDE接近两个在线学习问题，以及对一些基于神经网络的主题学习算法的实证研究
6. An Efficient Sampling-Based Algorithms Using Active Learning and Manifold Learning for Multiple Unmanned Aerial Vehicle Task Allocation under Uncertainty [O] . Xiaowei Fu, Hui Wang, Bin Li, 2018

机译：不确定性下基于主动学习和流形学习的高效采样算法用于多种无人机任务分配
7. AN ALTERNATIVE NATURAL GRADIENT APPROACH FOR ICA BASED LEARNING ALGORITHMS IN BLIND SOURCE SEPARATION [O] . Arcangeli Andrea, Squartini Stefano, Piazza F. 2004

机译：基于ICA的盲源分离学习算法的交替自然梯度方法。

Natural gradient based reinforcement learning algorithm using active stimulating

摘要

著录项

相似文献

相关主题

期刊订阅