首页> 美国卫生研究院文献>PLoS Clinical Trials >Multi-agent reinforcement learning with approximate model learning for competitive games
【2h】

Multi-agent reinforcement learning with approximate model learning for competitive games

机译:多主体强化学习和近似模型学习的竞技游戏

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents’ parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.
机译:我们提出了一种学习多代理策略以与多个对手竞争的方法。该方法由基于递归神经网络的行为者评论网络和确定性策略梯度组成,这些策略梯度通过通信促进代理之间的合作。学习过程不需要访问对手的参数或观察值,因为特工是与对手分开训练的。参与者网络使代理能够使用前进和后退路径进行通信,而评论家网络则根据参与者对全球奖励的贡献,通过向他们传递梯度信号来帮助训练参与者。此外,为了解决由于其他主体的发展而引起的不稳定,我们建议使用辅助预测网络进行近似模型学习,以对状态转换,奖励函数和对手行为进行建模。在测试阶段,我们使用竞争性的多智能体环境通过比较来证明所提出的方法在学习效率和目标成就方面的有用性和优越性。比较结果表明,该方法优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号