首页> 外文会议>AIAA SciTech Forum and Exposition >Preparation of PLearning How to Soar: Steady State Autonomous Dynamic Soaring Through Reinforcement Learning AIAA Technical Conferences
【24h】

Preparation of PLearning How to Soar: Steady State Autonomous Dynamic Soaring Through Reinforcement Learning AIAA Technical Conferences

机译:PLearning如何腾飞的准备:通过强化学习AIAA技术会议实现稳态自主动态腾飞

获取原文

摘要

Dynamic soaring is a cyclic climbing and diving maneuver that enables birds and gilding aircraft to perpetually stay aloft if it is performed in the presence of a horizontal wind gradient. An autonomous aircraft capable of performing a dynamic soaring maneuver could do the same. unlocking the potential for expansive travel beyond the scale of what seabirds like the albatross can achieve. This work formulates dynamic soaring of a small autonomous gliding aircraft in the context of a shear wind gradient as an optimal control problem. We describe an online reinforcement learning controller that can execute the PS maneuver to a steady state. The learning controller is taught by a tracking controller that has been shown to achieve steady state dynamic soaring control in simulation under stable and known (to the UAV) environmental conditions. Our experiments in simulation show that after fully training the learning controller it outperforms the teaching controller in terms of energy gain per cycle, and number of cycles to reach a steady state DS orbit (4 versus 6 for the teaching controller). In a final comparison, we test both controllers in a scenario where there is a mismatch between what the measured windspecd error is high. We show the where the teaching controller destabilizes and crashes alter a few orbits, the learning controller adapts quickly and converges to a successful steady state orbit.
机译:动态腾飞是一种周期性的爬升和潜水动作,如果在水平风梯度下进行,则鸟类和烫金飞机可以永久地保持在高空。能够执行动态高飞机动的自动驾驶飞机也可以做到这一点。释放超出信天翁之类的海鸟所能达到的扩展旅行潜力的可能性。这项工作可以将小型自主滑翔飞机的动态腾飞公式化为剪切风梯度,以此作为最佳控制问题。我们描述了一种在线强化学习控制器,该控制器可以将PS动作执行到稳定状态。该学习控制器由跟踪控制器教授,该跟踪控制器已显示在稳定和已知(对UAV而言)环境条件下可以在仿真中实现稳态动态高飞控制。我们的仿真实验表明,经过充分训练后的学习控制器,其在每个周期的能量增益和达到稳态DS轨道的周期数(对于教学控制器为4比6)方面都优于教学控制器。在最终的比较中,我们在测得的风速测量误差高之间存在不匹配的情况下测试两个控制器。我们展示了示教控制器在哪里不稳定和坠毁改变了几个轨道的位置,学习控制器迅速适应并收敛到成功的稳态轨道。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号