首页> 外文会议>IEEE International Conference on Tools with Artificial Intelligence >Experience Selection in Multi-agent Deep Reinforcement Learning
【24h】

Experience Selection in Multi-agent Deep Reinforcement Learning

机译:多主体深度强化学习中的经验选择

获取原文

摘要

Experience replay is a crucial technology for off-policy deep reinforcement learning, which uses a portion of memory as the replay buffer to store previous experience samples for the later policy update. Since each experience sample can be used multiple times, experience replay drastically improves the utilization rate of experience samples. However, how to effectively combine experience replay with multi-agent reinforcement learning is still an open challenge. In multi-agent reinforcement learning, the decision of the agent needs to consider the dynamic information of the environment as well as the behavior of other agents. If the policies of other agents change, updating the current policy with previous experience samples may deteriorate the agents' subsequent decisions. Some methods use a small-capacity replay buffer to store recent experience samples. Although this avoids the problem that the experience sample in the replay buffer is not compatible with the current policy, it will reduce the diversity of experience samples in the replay buffer, that resulting in agents unable to learn the optimal strategy. This paper eases this conflict by enhancing the experience selection mechanism: 1) we use the reservoir retention algorithm to increase the diversity of experience samples in the replay buffer; 2) we use prioritized experience replay to alleviate the problem that the experience sample in the replay buffer is not compatible with the current policy. The experimental results on the covert communication problem confirm the effectiveness of our proposed method.
机译:体验重播是非策略深度强化学习的一项关键技术,该技术使用一部分内存作为重播缓冲区来存储以前的体验样本以供以后的策略更新。由于每个体验样本可以多次使用,因此体验重播可以大大提高体验样本的利用率。然而,如何有效地将经验重播与多主体强化学习相结合仍然是一个开放的挑战。在多主体强化学习中,主体的决策需要考虑环境的动态信息以及其他主体的行为。如果其他代理的策略发生变化,则使用先前的经验样本更新当前策略可能会使代理的后续决策恶化。一些方法使用小容量重播缓冲区来存储最近的体验​​样本。尽管这避免了重播缓冲区中的体验样本与当前策略不兼容的问题,但是它将减少重播缓冲区中的体验样本的多样性,从而导致代理无法学习最佳策略。本文通过增强体验选择机制来缓解这种冲突:1)我们使用存储库保留算法来增加重播缓冲区中体验样本的多样性; 2)我们使用优先级的体验重播来缓解重播缓冲区中的体验样本与当前策略不兼容的问题。秘密通信问题的实验结果证实了我们所提方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号