Optimal and Approximate Q-value Functions for Decentralized POMDPs

Frans A. Oliehoek; Frans A. Oliehoek; Nikos Vlassis

首页> 外文期刊>The Journal of Artificial Intelligence Research >Optimal and Approximate Q-value Functions for Decentralized POMDPs

【24h】

Optimal and Approximate Q-value Functions for Decentralized POMDPs

机译：分散POMDP的最佳和近似Q值函数

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q{sup}* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q{sup}*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q{sup}*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

机译：决策理论规划是解决顺序决策问题的一种流行方法，因为它以有原则的方式处理感知和行为上的不确定性。在MDP和POMDP之类的单代理程序框架中，可以通过使用Q值函数来进行计划：通过动态编程以递归方式计算最优Q值函数Q {sup} *，然后优化策略为从Q {sup} *中提取。在本文中，我们研究了是否可以为分散的POMDP模型（Dec-POMDPs）定义类似的Q值函数，以及如何从这些值函数中提取策略。我们为Dec-POMDP定义了两种形式的最优Q值函数：一种以规范的描述作为最优纯联合政策的Q值函数，另一种是顺序合理的，从而给出了计算方法。但是，除了最小的问题外，这种计算是不可行的。因此，我们分析各种允许高效计算的近似Q值函数。我们描述它们之间的关系，并证明它们都为最佳Q值函数Q {sup} *提供了一个上限。最后，结合以前解决Dec-POMDP的一些方法，我们描述了一系列从此类Q值函数中提取策略的算法，并对现有的测试问题（包括新的消防基准问题）进行了实验评估。

著录项

来源
《The Journal of Artificial Intelligence Research》 |2008年第0期|共65页
作者
Frans A. Oliehoek; Frans A. Oliehoek; Nikos Vlassis;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . Oliehoek F. A., Spaan M. T. J., Vlassis N. The Journal of Artificial Intelligence Research . 2008,第4期

机译：分散POMDP的最佳和近似Q值函数
2. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . F. A. Oliehoek, M. T. J. Spaan, N. Vlassis Journal of Automation, Mobile Robotics & Intelligent Systems . 2008,第1期

机译：分散POMDP的最佳和近似Q值函数
3. Optimal and Approximate Q-value Functions for Decentralized POMDPs [J] . Frans A. Oliehoek, Frans A. Oliehoek, Nikos Vlassis The Journal of Artificial Intelligence Research . 2008,第0期

机译：分散POMDP的最佳和近似Q值函数
4. Q-value functions for decentralized POMDPs [C] . Frans A. Oliehoek, Nikos Vlassis, PNikos Vlassis International joint conference on Autonomous agents and multiagent systems . 2007

机译：分散式POMDP的Q值函数
5. Approximate dynamic programming based solutions for fixed-final-time optimal control and optimal switching. [D] . Heydari, Ali. 2013

机译：基于近似动态编程的解决方案，用于固定最终时间的最佳控制和最佳切换。
6. Modeling and Planning with Macro-Actions in Decentralized POMDPs [O] . Christopher Amato, George Konidaris, Leslie P. Kaelbling, -1

机译：在分散的POMDP中使用宏动作进行建模和计划
7. Optimal and Approximate Q-value Functions for Decentralized POMDPs [O] . Oliehoek, Frans A., Spaan, Matthijs T. J., Vlassis, Nikos 2011

机译：分散pOmDp的最优和近似Q值函数
8. Q-Value Dependence of Inelastic Scattering and Multinucleon Transfer Reactions exp 27 Al + exp 16 O at 88 MeV. Optimum Q Values and Q-Value Dependence of Angular Distributions of Reaction Products [R] . Mikumo, T., Sasagase, M., Sato, M., 1979

机译：在88meV下，非弹性散射和多核转移反应的Q值依赖性为27 al + exp 16 O.反应产物角分布的最佳Q值和Q值依赖性

Optimal and Approximate Q-value Functions for Decentralized POMDPs

摘要

著录项

相似文献

相关主题

期刊订阅