Approximate Dynamic Programming: From Optimization in Policy Space to Actor-Critic Methods

机译：近似动态规划：从策略空间中的优化到参与者关键方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider simulation-based approximation methods for large scale dynamic programming problems. We discuss briefly methods relying on value function approximation ('reinforcement learning' or 'neuro-dynamic programming'). We then consider policy-space methods in which we start with a parametric class of policies, and tune the policy parameters using gradient descent. Such methods may suffer from a large variance, and we indicate approaches for variance reduction. We conclude with a discussion of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference (TD) learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction based on information provided by the critic. We indicate that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor, and state some convergence results.

机译：我们考虑针对大型动态规划问题的基于仿真的近似方法。我们简要讨论了依赖于值函数逼近的方法（“强化学习”或“神经动力学编程”）。然后，我们考虑策略空间方法，其中从参数的策略类开始，然后使用梯度下降来调整策略参数。这样的方法可能会遭受较大的方差，因此我们指出了减少方差的方法。最后，我们讨论了参与者批评算法。这些是两个时间尺度的算法，其中评论者使用具有线性参数化近似体系结构的时差（TD）学习，并且基于评论者提供的信息在近似梯度方向上更新actor。我们指出，批判者的特征在理想情况下应跨越由参与者的参数化选择所指定的子空间，并陈述一些收敛结果。

著录项

来源
《Yale workshop on adaptive and learning systems》|2001年|p.9-10|共2页
会议地点 New Haven CT(US)
作者
John N. Tsitsiklis;
展开▼
作者单位

Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge MA 02139 USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Approximate gradient methods in policy-space optimization of Markov reward processes [J] . Marbach P., Tsitsiklis JN. Discrete event dynamic systems: Theory and applications . 2003,第1a2期

机译：Markov奖励过程的策略空间优化中的近似梯度法
2. Approximate Dynamic Programming for Nonlinear-Constrained Optimizations [J] . Yang Xiong, He Haibo, Zhong Xiangnan Cybernetics, IEEE Transactions on . 2021,第5期

机译：非线性约束优化的近似动态规划
3. Performance Guarantees for Model-Based Approximate Dynamic Programming in Continuous Spaces [J] . Beuchat Paul Nathaniel, Georghiou Angelos, Lygeros John IEEE Transactions on Automatic Control . 2020,第1期

机译：在连续空间中基于模型的近似动态规划的性能保证
4. Approximate dynamic programming solutions of multi-agent graphical games using actor-critic network structures [C] . Abouheaf Mohammed I., Lewis Frank L. International Joint Conference on Neural Networks . 2013

机译：使用行为者批判网络结构的多主体图形游戏的近似动态编程解决方案
5. Stochastic Dual Dynamic Programming and Backward Approximate Dynamic Programming with Integrated Crossing State Stochastic Models for Wind Power in Energy Storage Optimization [D] . Durante, Joseph L. 2020

机译：随机双动规范和倒退近似动态规划，具有集成交叉状态随机模型的蓄能优化
6. A Subspace Semi-Definite programming-based Underestimation (SSDU) method for stochastic global optimization in protein docking [O] . Feng Nan, Mohammad Moghadasi, Pirooz Vakili, -1

机译：基于子空间半定规划的低估（SSDU）方法用于蛋白质对接中的随机全局优化
7. Integral control on nonlinear spaces: two extensions**This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The first author’s visit to Ghent University has been supported by a CSC scholarship, initiated by the China Scholarship Council. The authors want to thank prof.R.Sepulchre for suggesting the pendulum example. [O] . Zhifei Zhang, Zhihao Ling, Alain Sarlette 2016

机译：非线性空间的积分控制：两个延期**本文提出了由比利时国家，科学政策办公室发起的环境景点波利斯计划资助的比利时网络Dysco（动态系统，控制和优化）的研究结果。第一作者对根特大学的访问，由中国奖学金委员会发起的CSC奖学金得到了支持。作者希望感谢PROF.R.Sepulchre建议摆锤示例。

Approximate Dynamic Programming: From Optimization in Policy Space to Actor-Critic Methods

摘要

著录项

相似文献

相关主题

期刊订阅