...
首页> 外文期刊>The Journal of Artificial Intelligence Research >Optimal and Approximate Q-value Functions for Decentralized POMDPs
【24h】

Optimal and Approximate Q-value Functions for Decentralized POMDPs

机译:分散POMDP的最佳和近似Q值函数

获取原文
           

摘要

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q{sup}* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q{sup}*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q{sup}*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.
机译:决策理论规划是解决顺序决策问题的一种流行方法,因为它以有原则的方式处理感知和行为上的不确定性。在MDP和POMDP之类的单代理程序框架中,可以通过使用Q值函数来进行计划:通过动态编程以递归方式计算最优Q值函数Q {sup} *,然后优化策略为从Q {sup} *中提取。在本文中,我们研究了是否可以为分散的POMDP模型(Dec-POMDPs)定义类似的Q值函数,以及如何从这些值函数中提取策略。我们为Dec-POMDP定义了两种形式的最优Q值函数:一种以规范的描述作为最优纯联合政策的Q值函数,另一种是顺序合理的,从而给出了计算方法。但是,除了最小的问题外,这种计算是不可行的。因此,我们分析各种允许高效计算的近似Q值函数。我们描述它们之间的关系,并证明它们都为最佳Q值函数Q {sup} *提供了一个上限。最后,结合以前解决Dec-POMDP的一些方法,我们描述了一系列从此类Q值函数中提取策略的算法,并对现有的测试问题(包括新的消防基准问题)进行了实验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号