首页> 外文会议>IEE Colloquium on Why aren't we Training Measurement Engineers?, 1992 >Recursive Learning Automata for Control of Partially Observable Markov Decision Processes
【24h】

Recursive Learning Automata for Control of Partially Observable Markov Decision Processes

机译:用于部分可观察的马尔可夫决策过程控制的递归学习自动机

获取原文

摘要

This paper presents a sampling algorithm, called 'Recursive Automata Sampling Algorithm (RASA),' for control of finite horizon information-state Markov decision processes (MDPs), the equivalent model of partially observable MDPs. RASA extends in a recursive manner the Pursuit algorithm designed with learning automata by Rajaraman and Sastry for solving stochastic optimization problems. Based on the finite-time analysis of the Pursuit algorithm, we analyze the finite-time behavior of RASA, providing a bound on the probability that a given initial state takes the optimal action, and a bound on the probability that the difference between the optimal value and the estimate of it exceeds a given error. We also discuss how to apply RASA in the direct context of POMDPs and how to incorporate heuristic knowledge into RASA for on-line control.
机译:本文提出了一种采样算法,称为“递归自动机采样算法(RASA)”,用于控制有限水平信息状态的马尔可夫决策过程(MDP),即部分可观察MDP的等效模型。 RASA以递归方式扩展了由Rajaraman和Sastry设计的具有学习自动机的Pursuit算法,用于解决随机优化问题。基于Pursuit算法的有限时间分析,我们分析了RASA的有限时间行为,提供了给定初始状态采取最优操作的概率的界限,以及最优状态之间的差的概率的界限值,并且其估计值超过了给定的误差。我们还将讨论如何在POMDP的直接上下文中应用RASA,以及如何将启发式知识纳入RASA进行在线控制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号