首页> 外文会议>Thirteenth Annual Conference on Computational Learning Theory, Jun 28-Jul 1, 2000, Palo Alto, California >Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning
【24h】

Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning

机译:基于梯度的强化学习的估计和近似界

获取原文
获取原文并翻译 | 示例

摘要

We model reinforcement learning as the problem of learning to control a Partially Observable Markov Decision Process (POMDP), and focus on gradient ascent approaches to this problem. In [3] we introduced GPOMDP, an algorithm for estimating the performance gradient of a POMDP from a single sample path, and we proved that this algorithm almost surely converges to an approximation to the gradient. In this paper, we provide a convergence rate for the estimates produced by GPOMDP, and give an improved bound on the approximation error of these estimates. Both of these bounds are in terms of mixing times of the POMDP.
机译:我们将强化学习模型化为学习控制部分可观察的马尔可夫决策过程(POMDP)的问题,并将重点放在解决此问题的梯度上升方法上。在[3]中,我们引入了GPOMDP,这是一种从单个样本路径估计POMDP性能梯度的算法,我们证明了该算法几乎可以肯定地收敛到梯度的近似值。在本文中,我们为GPOMDP产生的估计值提供了收敛速度,并给出了这些估计值的近似误差的改进边界。这两个界限均以POMDP的混合时间为准。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号