首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
【2h】

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

机译:控制合作固定翼UAV避免的多读联合近端政策优化算法研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.
机译:多种无人驾驶飞行器(UAV)合作具有很大的潜力。为了提高多UAV控制的智能和环境适应性,我们研究了深度加强学习算法在多UAV合作控制领域的应用。旨在解决由钢筋学习中的学习代理战略变化引起的非静止环境的问题,该论文提出了一种改进的多元强化学习算法 - 多读联合近端策略优化(Majppo)算法集中学习和分散执行。该算法使用移动窗口平均方法来使每个代理获得集中式状态值函数,使得代理可以实现更好的协作。改进的算法增强了协作,并增加了多算系统获得的奖励值的总和。为了评估算法的性能,我们使用Majppo算法完成多UAV形成的任务和多障碍环境的交叉。为了简化UAV的控制复杂性,我们使用UAV的动力学模型的六程度自由和12状态方程与姿态控制回路。实验结果表明,MajPPO算法具有更好的性能和更好的环境适应性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号