...
首页> 外文期刊>Journal of Automation, Mobile Robotics & Intelligent Systems >Learning Partially Observable Deterministic Action Models
【24h】

Learning Partially Observable Deterministic Action Models

机译:学习部分可观察的确定性行动模型

获取原文
           

摘要

We present exact algorithms for identifying deterministic-actions' effects and preconditions in dynamic partially observable domains. They apply when one does not know the action model(the way actions affect the world) of a domain and must learn it from partial observations over time. Such scenarios are common in real world applications. They are challenging for AI tasks because traditional domain structures that underly tractability (e.g., conditional independence) fail there (e.g., world features become correlated). Our work departs from traditional assumptions about partial observations and action models. In particular, it focuses on problems in which actions are deterministic of simple logical structure and observation models have all features observed with some frequency. We yield tractable algorithms for the modified problem for such domains. Our algorithms take sequences of partial observations over time as input, and output deterministic action models that could have lead to those observations. The algorithms output all or one of those models (depending on our choice), and are exact in that no model is misclassified given the observations. Our algorithms take polynomial time in the number of time steps and state features for some traditional action classes examined in the AI-planning literature, e.g., STRIPS actions. In contrast, traditional approaches for HMMs and Reinforcement Learning are inexact and exponentially intractable for such domains. Our experiments verify the theoretical tractability guarantees, and show that we identify action models exactly. Several applications in planning, autonomous exploration, and adventure-game playing already use these results. They are also promising for probabilistic settings, partially observable reinforcement learning, and diagnosis.
机译:我们提出了用于确定动态部分可观察域中确定性作用的影响和前提的精确算法。当一个人不知道某个领域的行为模型(行为影响世界的方式),并且必须随着时间的推移从部分观察中学习时,它们便适用。这种情况在现实世界的应用程序中很常见。它们对AI任务具有挑战性,因为传统的领域结构在可伸缩性方面(例如条件独立性)不够强(例如世界特征变得相关)。我们的工作脱离了关于部分观察和行动模型的传统假设。特别是,它关注的问题是动作是简单逻辑结构的确定性,并且观察模型具有以一定频率观察到的所有特征。对于此类域的修正问题,我们给出了易于处理的算法。我们的算法将经过一段时间的部分观察序列作为输入,并输出可能导致这些观察的确定性行为模型。该算法输出所有或其中一个模型(取决于我们的选择),并且精确的是,鉴于观察结果,不会对任何模型进行错误分类。对于AI计划文献中检查的某些传统动作类(例如STRIPS动作),我们的算法将时间步数和状态特征取多项式时间。相反,对于此类领域,传统的HMM和强化学习方法并不精确,并且难以解决。我们的实验验证了理论上的易操作性保证,并表明我们可以准确地确定动作模型。规划,自主探索和冒险游戏中的一些应用程序已经使用了这些结果。它们还有望用于概率性环境,部分可观察到的强化学习和诊断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号