首页> 外文会议>2019 International Conference on Robotics and Automation >Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning
【24h】

Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning

机译:弥合汉密尔顿-雅各比安全分析与加固学习

获取原文
获取原文并翻译 | 示例

摘要

Safety analysis is a necessary component in the design and deployment of autonomous robotic systems. Techniques from robust optimal control theory, such as Hamilton-Jacobi reachability analysis, allow a rigorous formalization of safety as guaranteed constraint satisfaction. Unfortunately, the computational complexity of these tools for general dynamical systems scales poorly with state dimension, making existing tools impractical beyond small problems. Modern reinforcement learning methods have shown promising ability to find approximate yet proficient solutions to optimal control problems in complex and high-dimensional systems, however their application has in practice been restricted to problems with an additive payoff over time, unsuitable for reasoning about safety. In recent work, we introduced a time-discounted modification of the problem of maximizing the minimum payoff over time, central to safety analysis, through a modified dynamic programming equation that induces a contraction mapping. Here, we show how a similar contraction mapping can render reinforcement learning techniques amenable to quantitative safety analysis as tools to approximate the safe set and optimal safety policy. This opens a new avenue of research connecting control-theoretic safety analysis and the reinforcement learning domain. We validate the correctness of our formulation by comparing safety results computed through Q-learning to analytic and numerical solutions, and demonstrate its scalability by learning safe sets and control policies for simulated systems of up to 18 state dimensions using value learning and policy gradient techniques.
机译:安全分析是自主机器人系统的设计和部署中的必要组成部分。诸如Hamilton-Jacobi可达性分析之类的强大的最优控制理论所采用的技术,可以将安全性严格地形式化为保证的约束满足。不幸的是,这些工具在一般动力学系统中的计算复杂度难以随状态维扩展,从而使现有工具无法解决小问题。现代强化学习方法已经显示出在复杂和高维系统中找到最佳控制问题的近似而有效的解决方案的有希望的能力,但是在实践中,它们的应用实际上仅限于随着时间的推移会产生附加收益的问题,不适合出于安全性方面的原因。在最近的工作中,我们通过修改后的动态规划方程式(引入了收缩映射),对安全分析中的最大收益随时间最大化问题进行了时间折扣修改。在这里,我们展示了类似的收缩映射如何使适合于定量安全分析的强化学习技术成为近似安全设置和最佳安全策略的工具。这为连接控制理论安全分析和强化学习领域开辟了新的研究途径。通过将通过Q学习计算的安全结果与解析和数值解进行比较,我们验证了配方的正确性,并通过使用价值学习和策略梯度技术学习了多达18个状态维度的模拟系统的安全集和控制策略,从而证明了其可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号