Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

机译：基于梯度的强化学习的估计和近似界

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (POMDP), and focus on gradient ascent approaches to this problem. In [3] we introduced GPOMDP, an algorithm for estimating the performance gradient of a POMDP from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by GPOMDP, and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the POMDP.

机译：我们将强化学习模型化为学习控制部分可观察的马尔可夫决策过程（POMDP）的问题，并将重点放在解决此问题的梯度上升方法上。在[3]中，我们引入了GPOMDP，这是一种从单个样本路径估计POMDP性能梯度的算法，我们证明了该算法几乎可以肯定地收敛到梯度的近似值。在本文中，我们为GPOMDP产生的估计值提供了收敛速度，并给出了这些估计值的近似误差的改进边界。这两个界限均以POMDP的混合时间为准。

著录项

来源
《Thirteenth Annual Conference on Computational Learning Theory, Jun 28-Jul 1, 2000, Palo Alto, California》|2000年|p.133-141|共9页
会议地点 Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US)
作者
Peter L. Bartlett; Jonathan Baxter;
展开▼
作者单位

Research School of Information Sciences and Engineering Australian National University Canberra ACT 0200, AUSTRALIA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Two-step gradient-based reinforcement learning for underwater robotics behavior learning [J] . Andres El-Fakdi, Marc Carreras Robotics and Autonomous Systems . 2013,第3期

机译：基于两步梯度的水下机器人行为学习强化学习
2. Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning [J] . Chen Diqi, Wang Yizhou, Gao Wen Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第10期

机译：结合基于梯度的方法和多目标强力学习的演化策略
3. Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning [J] . Sun Yushan, Cheng Junhan, Zhang Guocheng, Journal of Intelligent & Robotic Systems: Theory & Application . 2019,第3a4期

机译：基于政策梯度的深度加固学习的自主水下车辆的茂盛运动规划系统
4. DIRECT GRADIENT-BASED REINFORCEMENT LEARNING FOR ROBOT BEHAVIOR LEARNING [C] . Andres El-Fakdi, Marc Carreras, Pere Ridao International Conference on Informatics in Control, Automation and Robotics . 2005

机译：基于直接的基于梯度的强化学习，用于机器人行为学习
5. Value Function Approximation Algorithms for Reinforcement Learning in Delay-Sensitive Wireless Communications [D] . Sharma, Nikhilesh. 2020

机译：延迟敏感无线通信中增强学习的价值函数近似算法
6. Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions [O] . Minija Tamosiunaite, Tamim Asfour, Florentin Wörgötter -1

机译：通过使用连续动作的基于受体场的函数逼近方法通过强化学习来学习达到
7. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning [O] . Bartlett Peter L., Baxter Jonathan 2002

机译：基于梯度的强化学习的估计和近似界

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅