Experience Selection in Multi-agent Deep Reinforcement Learning

机译：多主体深度强化学习中的经验选择

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Experience replay is a crucial technology for off-policy deep reinforcement learning, which uses a portion of memory as the replay buffer to store previous experience samples for the later policy update. Since each experience sample can be used multiple times, experience replay drastically improves the utilization rate of experience samples. However, how to effectively combine experience replay with multi-agent reinforcement learning is still an open challenge. In multi-agent reinforcement learning, the decision of the agent needs to consider the dynamic information of the environment as well as the behavior of other agents. If the policies of other agents change, updating the current policy with previous experience samples may deteriorate the agents' subsequent decisions. Some methods use a small-capacity replay buffer to store recent experience samples. Although this avoids the problem that the experience sample in the replay buffer is not compatible with the current policy, it will reduce the diversity of experience samples in the replay buffer, that resulting in agents unable to learn the optimal strategy. This paper eases this conflict by enhancing the experience selection mechanism: 1) we use the reservoir retention algorithm to increase the diversity of experience samples in the replay buffer; 2) we use prioritized experience replay to alleviate the problem that the experience sample in the replay buffer is not compatible with the current policy. The experimental results on the covert communication problem confirm the effectiveness of our proposed method.

机译：体验重播是非策略深度强化学习的一项关键技术，该技术使用一部分内存作为重播缓冲区来存储以前的体验样本以供以后的策略更新。由于每个体验样本可以多次使用，因此体验重播可以大大提高体验样本的利用率。然而，如何有效地将经验重播与多主体强化学习相结合仍然是一个开放的挑战。在多主体强化学习中，主体的决策需要考虑环境的动态信息以及其他主体的行为。如果其他代理的策略发生变化，则使用先前的经验样本更新当前策略可能会使代理的后续决策恶化。一些方法使用小容量重播缓冲区来存储最近的体验样本。尽管这避免了重播缓冲区中的体验样本与当前策略不兼容的问题，但是它将减少重播缓冲区中的体验样本的多样性，从而导致代理无法学习最佳策略。本文通过增强体验选择机制来缓解这种冲突：1）我们使用存储库保留算法来增加重播缓冲区中体验样本的多样性; 2）我们使用优先级的体验重播来缓解重播缓冲区中的体验样本与当前策略不兼容的问题。秘密通信问题的实验结果证实了我们所提方法的有效性。

著录项

来源
《IEEE International Conference on Tools with Artificial Intelligence》|2019年|864-870|共7页
会议地点
作者
Yishen Wang; Zongzhang Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
learning (artificial intelligence); multi-agent systems;

机译：学习（人工智能）;多智能体系统;

相似文献

外文文献
中文文献
专利

1. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning [J] . Jakob Foerster, Nantas Nardelli, Gregory Farquhar, JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：稳定的体验重播，以进行深度的多智能体强化学习
2. Experience Selection in Deep Reinforcement Learning for Control [J] . Tim de Bruin, Jens Kober, Karl Tuyls, Journal of machine learning research . 2018,第a期

机译：深度加固学习体验选择
3. Learning multi-agent communication with double attentional deep reinforcement learning [J] . Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Autonomous agents and multi-agent systems . 2020,第1期

机译：学习多智能经纪人沟通与双重预付深度加强学习
4. Experience Selection in Multi-agent Deep Reinforcement Learning [C] . Yishen Wang, Zongzhang Zhang IEEE International Conference on Tools with Artificial Intelligence . 2019

机译：多智能经纪深度加固学习体验选择
5. Macro-Action-Based Multi-Agent Deep Reinforcement Learning in Cooperative Tasks [D] . Lu, Xingyu. 2021

机译：基于宏观动作的多智能经济型深度加强学习合作任务
6. On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach [O] . Hang Qi, Hao Huang, Zhiqun Hu, 2020

机译：异构WLAN中的按需信道绑定：多代理深度强化学习方法
7. Cellular UAV-to-Device Communications: Trajectory Design and Mode Selection by Multi-Agent Deep Reinforcement Learning [O] . Fanyi Wu, Hongliang Zhang, Jianjun Wu, 2020

机译：蜂窝无人机与设备通信：多代理深度增强学习的轨迹设计和模式选择

Experience Selection in Multi-agent Deep Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅