Recursive Learning Automata for Control of Partially Observable Markov Decision Processes

机译：用于部分可观察的马尔可夫决策过程控制的递归学习自动机

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a sampling algorithm, called 'Recursive Automata Sampling Algorithm (RASA),' for control of finite horizon information-state Markov decision processes (MDPs), the equivalent model of partially observable MDPs. RASA extends in a recursive manner the Pursuit algorithm designed with learning automata by Rajaraman and Sastry for solving stochastic optimization problems. Based on the finite-time analysis of the Pursuit algorithm, we analyze the finite-time behavior of RASA, providing a bound on the probability that a given initial state takes the optimal action, and a bound on the probability that the difference between the optimal value and the estimate of it exceeds a given error. We also discuss how to apply RASA in the direct context of POMDPs and how to incorporate heuristic knowledge into RASA for on-line control.

机译：本文提出了一种采样算法，称为“递归自动机采样算法（RASA）”，用于控制有限水平信息状态的马尔可夫决策过程（MDP），即部分可观察MDP的等效模型。 RASA以递归方式扩展了由Rajaraman和Sastry设计的具有学习自动机的Pursuit算法，用于解决随机优化问题。基于Pursuit算法的有限时间分析，我们分析了RASA的有限时间行为，提供了给定初始状态采取最优操作的概率的界限，以及最优状态之间的差的概率的界限值，并且其估计值超过了给定的误差。我们还将讨论如何在POMDP的直接上下文中应用RASA，以及如何将启发式知识纳入RASA进行在线控制。

著录项

来源
《IEE Colloquium on Why aren't we Training Measurement Engineers?, 1992》|1992年|p.6091-6096|共6页
会议地点
作者
Hyeong Soo Chang; Fu, M.C.; Marcus, S.I.;
展开▼
作者单位

Department of Computer Science and Engineering Sogang University Seoul Korea;

Program of Integrated Biotechnology at Sogang University. hschang@sogang.ac.kr;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Modeling Human Recursive Reasoning Using Empirically Informed Interactive Partially Observable Markov Decision Processes [J] . Doshi P., Qu X., Goodie A. S., Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on . 2012,第6期

机译：使用经验丰富的交互式部分可观察的马尔可夫决策过程对人类递归推理建模
2. Recursive Learning Automata Approach to Markov Decision Processes [J] . Chang H. S., Fu M. C., Hu J., IEEE Transactions on Automatic Control . 2007,第7期

机译：马尔可夫决策过程的递归学习自动机方法
3. Stochastic Predictive Control for Partially Observable Markov Decision Processes With Time-Joint Chance Constraints and Application to Autonomous Vehicle Control [J] . Li Nan, Girard Anouck, Kolmanovsky Ilya Journal of Dynamic Systems, Measurement, and Control . 2019,第7期

机译：随机预测控制对部分观察到的马尔可夫决策过程，时间关节机会限制和应用于自主车辆控制
4. Recursive Learning Automata for Control of Partially Observable Markov Decision Processes [C] . Hyeong Soo Chang, Michael C. Fu, Steven I. Marcus, IEEE Conference on Decision and Control . 2005

机译：用于控制部分观察到的马尔可夫决策过程的递归学习自动机
5. Learning partially observable Markov decision processes using abstract actions. [D] . Janzadeh, Hamed. 2012

机译：使用抽象动作学习部分可观察的马尔可夫决策过程。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Stochastic Optimization of Controlled Partially Observable Markov Decision Processes [O] . Peter L. Bartlett, Jonathan Baxter 100

机译：受控部分可观察的马尔可夫决策过程的随机优化
8. Cooperation and Coordination Between Fuzzy Reinforcement Learning Agents in Continuous State Partially Observable Markov Decision Processes [R] . Berenji, Hamid R., Vengerov, David 1999

机译：连续状态部分可观测马尔可夫决策过程中模糊强化学习agent的协作与协调

Recursive Learning Automata for Control of Partially Observable Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅