首页> 外文会议>Yale workshop on adaptive and learning systems >Approximate Dynamic Programming: From Optimization in Policy Space to Actor-Critic Methods
【24h】

Approximate Dynamic Programming: From Optimization in Policy Space to Actor-Critic Methods

机译:近似动态规划:从策略空间中的优化到参与者关键方法

获取原文

摘要

We consider simulation-based approximation methods for large scale dynamic programming problems. We discuss briefly methods relying on value function approximation ('reinforcement learning' or 'neuro-dynamic programming'). We then consider policy-space methods in which we start with a parametric class of policies, and tune the policy parameters using gradient descent. Such methods may suffer from a large variance, and we indicate approaches for variance reduction. We conclude with a discussion of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based on information provided by the critic. We indicate that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor, and state some convergence results.
机译:我们考虑针对大型动态规划问题的基于仿真的近似方法。我们简要讨论了依赖于值函数逼近的方法(“强化学习”或“神经动力学编程”)。然后,我们考虑策略空间方法,其中从参数的策略类开始,然后使用梯度下降来调整策略参数。这样的方法可能会遭受较大的方差,因此我们指出了减少方差的方法。最后,我们讨论了参与者批评算法。这些是两个时间尺度的算法,其中评论者使用具有线性参数化近似体系结构的时差(TD)学习,并且基于评论者提供的信息在近似梯度方向上更新actor。我们指出,批判者的特征在理想情况下应跨越由参与者的参数化选择所指定的子空间,并陈述一些收敛结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号