...
首页> 外文期刊>IEEE Robotics and Automation Letters >Bi-Directional Value Learning for Risk-Aware Planning Under Uncertainty
【24h】

Bi-Directional Value Learning for Risk-Aware Planning Under Uncertainty

机译:不确定性下风险感知计划的双向价值学习

获取原文
获取原文并翻译 | 示例
           

摘要

Decision-making under uncertainty is a crucial ability for autonomous systems. In its most general form, this problem can be formulated as a partially observable Markov decision process (POMDP). The solution policy of a POMDP can be implicitly encoded as a value function. In partially observable settings, the value function is typically learned via forward simulation of the system evolution. Focusing on accurate and long-range risk assessment, we propose a novel method, where the value function is learned in different phases via a bi-directional search in belief space. A backward value learning process provides a long-range and risk-aware base policy. A forward value learning process ensures local optimality and updates the policy via forward simulations. We consider a class of scalable and continuous-space rover navigation problems to assess the safety, scalability, and optimality of the proposed algorithm. The results demonstrate the capabilities of the proposed algorithm in evaluating long-range risk/safety of the planner while addressing continuous problems with long planning horizons.
机译:不确定性下的决策是自主系统的关键能力。以最一般的形式,此问题可以表述为部分可观察到的马尔可夫决策过程(POMDP)。 POMDP的解决方案策略可以隐式编码为值函数。在部分可观察的设置中,通常通过系统演化的正向模拟来学习值函数。针对准确和长期的风险评估,我们提出了一种新颖的方法,即通过在信念空间中进行双向搜索在不同阶段学习价值函数。向后价值学习过程提供了长期的,具有风险意识的基本策略。前向价值学习过程可确保局部最优,并通过前向仿真来更新策略。我们考虑一类可扩展和连续空间的漫游者导航问题,以评估所提出算法的安全性,可扩展性和最优性。结果证明了所提出算法在评估规划者的远程风险/安全性的同时解决了长期规划视野中的连续问题的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号