首页> 外文会议>International Computer Conference, Computer Society of Iran >Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient
【24h】

Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient

机译:用于异步深度确定性策略梯度的重复重放缓冲区

获取原文

摘要

Off-Policy Deep Reinforcement Learning (DRL) algorithms such as Deep Deterministic Policy Gradient (DDPG) has been used to teach intelligent agents to solve complicated problems in continuous space-action environments. Several methods have been successfully applied to increase the training performance and achieve better speed and stability for these algorithms. Such as experience replay to selecting a batch of transactions of the replay memory buffer. However, working with environments with sparse reward function is a challenge for these algorithms and causes them to reduce these algorithms' performance. This research intends to make the transaction selection process more efficient by increasing the likelihood of selecting important transactions from the replay memory buffer. Our proposed method works better with a sparse reward function or, in particular, with environments that have termination conditions. We are using a secondary replay memory buffer that stores more critical transactions. In the training process, transactions are select in both the first replay buffer and the secondary replay buffer. We also use a parallel environment to asynchronously execute and fill the primary replay buffer and the secondary replay buffer. This method will help us to get better performance and stability. Finally, we evaluate our proposed approach to the Crawler model, one of the Unity ML-Agent tasks with sparse reward function, against DDPG and AE-DDPG.
机译:诸如深度确定性政策梯度(DDPG)之类的偏离政策深度增强学习(DRL)算法已被用于教授智能代理,以解决连续空间动作环境中的复杂问题。已成功应用了几种方法以增加训练性能并实现这些算法的更好的速度和稳定性。如体验重放以选择重放内存缓冲区的一批交易。但是,使用具有稀疏奖励功能的环境是对这些算法的挑战,并导致它们降低这些算法的性能。本研究旨在通过增加从重放内存缓冲区选择重要事务的可能性来使交易选择过程更有效。我们所提出的方法使用稀疏奖励功能更好,或者特别是具有终止条件的环境。我们正在使用辅助重放内存缓冲区,用于存储更多关键事务。在培训过程中,事务在第一个重放缓冲区和辅助重放缓冲区中选择。我们还使用并行环境异步执行和填充主重放缓冲区和辅助重放缓冲区。这种方法将有助于我们获得更好的性能和稳定性。最后,我们评估了我们提出的爬行模型的方法,其中一个具有稀疏奖励功能的Unity ML-Agent任务,对抗DDPG和AE-DDPG。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号