首页> 外国专利> APPARATUS AND METHOD OF POLICY MODELING BASED ON PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES

APPARATUS AND METHOD OF POLICY MODELING BASED ON PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES

机译:基于部分可观察的马尔可夫决策过程的政策建模装置和方法

摘要

PURPOSE: An apparatus and a method of modeling a policy based on partially observable Markov decision processes are provided to apply an ADD(Algebraic Decision Diagram) to calculate the upper border and the lower border of HSVI(Heuristic Search Value Iteration), thereby reducing policy training time. CONSTITUTION: A problem defining unit(310) defines a problem by a mathematical parameter. A border calculating unit(320) calculates upper/lower borders by applying an ADD. A policy training unit(330) determines an action by a partial Markov decision process to train a policy. A policy outputting unit(340) offers the policy.
机译:目的:提供一种基于部分可观的马尔可夫决策过程对策略建模的装置和方法,以应用ADD(代数决策图)来计算HSVI(启发式搜索值迭代)的上下边界,从而减少策略训练时间。组成:问题定义单元(310)通过数学参数定义问题。边界计算单元(320)通过应用ADD来计算上/下边界。策略训练单元(330)通过部分马尔可夫决策过程来确定用于训练策略的动作。策略输出单元(340)提供策略。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号