MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces

Rakshita Agrawal; Matthew J. Realff; Jay H. Lee

首页> 外文期刊>Computers & Chemical Engineering >MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces

【24h】

MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces

机译：在具有较大或连续动作和观察空间的部分观察到的马尔可夫决策过程（POMDP）中基于MILP的价值备份

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Partially observed Markov decision processes (POMDPs) serve as powerful tools to model stochastic systems with partial state information. Since the exact solution methods for POMDPs are limited to problems with very small sizes of state, action and observation spaces, approximate point-based solution methods like PERSEUS have gained popularity. In this work, a mixed integer linear program (MILP) is developed for calculation of exact value updates (in PERSEUS and similar algorithms), when the POMDP has very large or continuous action space. Since the solution time of the MILP is very sensitive to the size of the observation space, the concept of post-decision belief space is introduced to generate a more efficient and flexible model. An example involving a flow network is presented to illustrate the concepts and compare the results with those of the existing techniques.

机译：部分观察到的马尔可夫决策过程（POMDP）是使用部分状态信息对随机系统进行建模的强大工具。由于POMDP的精确求解方法仅限于状态，动作和观察空间很小的问题，因此基于近似点的求解方法（如PERSEUS）已广受欢迎。在这项工作中，当POMDP具有非常大或连续的动作空间时，将开发一个混合整数线性程序（MILP）用于计算精确值更新（在PERSEUS和类似算法中）。由于MILP的求解时间对观测空间的大小非常敏感，因此引入了决策后置信空间的概念以生成更有效，更灵活的模型。提出了一个涉及流动网络的示例，以说明概念并将结果与现有技术的结果进行比较。

著录项

来源
《Computers & Chemical Engineering》 |2013年第13期|101-113|共13页
作者
Rakshita Agrawal; Matthew J. Realff; Jay H. Lee;
展开▼
作者单位

School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA 30022, USA;

School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA 30022, USA;

Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Markov decision processes; Dynamic programming; Mathematical programming; Partial observation; Network reliability;

机译：马尔可夫决策过程;动态编程;数学编程;部分观察;网络可靠性;

相似文献

外文文献
中文文献
专利

1. Monotonicity properties for two-action partially observable Markov decision processes on partially ordered spaces [J] . European Journal of Operational Research . 2020,第3期

机译：两个动作部分可观察到的Markov决策过程的单调性属性在部分有序空间上
2. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces [J] . Jiang Xiaofeng, Yang Jian, Tan Xiaobin, IEEE Transactions on Automatic Control . 2019,第5期

机译：具有连续状态，观察空间和动作空间的POMDP的基于观察的优化
3. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces [J] . Jiang Xiaofeng, Yang Jian, Tan Xiaobin, IEEE Transactions on Automatic Control . 2019,第5期

机译：具有连续状态，观察和动作空间的POMDP的基于观察优化
4. Partially Observable Markov Decision Process (POMDP) Technologies for Sign Language Based Human-Computer Interaction [C] . Sylvie C.W. Ong, David Hsu, Wee Sun Lee, International conference on universal access in human-computer interaction;HCI international 2009;International conference on human-computer interaction;UAHCI 2009 . 2009

机译：基于手语的人机交互的部分可观察马尔可夫决策过程（POMDP）技术
5. Improving dynamic decision making through RFID: A partially observable Markov decision process (POMDP) for RFID-enhanced warehouse search operations. [D] . Hariharan, Sharethram. 2006

机译：通过RFID改善动态决策：针对RFID增强的仓库搜索操作的部分可观察到的马尔可夫决策过程（POMDP）。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Monotonicity properties for two-action partially observable Markov decision processes on partially ordered spaces [O] . Erik Miehling, Demosthenis Teneketzis 2020

机译：两个动作部分可观察到的Markov决策过程的单调性属性在部分有序空间上

MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces

摘要

著录项

相似文献

相关主题

期刊订阅