Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient

机译：用于异步深度确定性策略梯度的重复重放缓冲区

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Off-Policy Deep Reinforcement Learning (DRL) algorithms such as Deep Deterministic Policy Gradient (DDPG) has been used to teach intelligent agents to solve complicated problems in continuous space-action environments. Several methods have been successfully applied to increase the training performance and achieve better speed and stability for these algorithms. Such as experience replay to selecting a batch of transactions of the replay memory buffer. However, working with environments with sparse reward function is a challenge for these algorithms and causes them to reduce these algorithms' performance. This research intends to make the transaction selection process more efficient by increasing the likelihood of selecting important transactions from the replay memory buffer. Our proposed method works better with a sparse reward function or, in particular, with environments that have termination conditions. We are using a secondary replay memory buffer that stores more critical transactions. In the training process, transactions are select in both the first replay buffer and the secondary replay buffer. We also use a parallel environment to asynchronously execute and fill the primary replay buffer and the secondary replay buffer. This method will help us to get better performance and stability. Finally, we evaluate our proposed approach to the Crawler model, one of the Unity ML-Agent tasks with sparse reward function, against DDPG and AE-DDPG.

机译：诸如深度确定性政策梯度（DDPG）之类的偏离政策深度增强学习（DRL）算法已被用于教授智能代理，以解决连续空间动作环境中的复杂问题。已成功应用了几种方法以增加训练性能并实现这些算法的更好的速度和稳定性。如体验重放以选择重放内存缓冲区的一批交易。但是，使用具有稀疏奖励功能的环境是对这些算法的挑战，并导致它们降低这些算法的性能。本研究旨在通过增加从重放内存缓冲区选择重要事务的可能性来使交易选择过程更有效。我们所提出的方法使用稀疏奖励功能更好，或者特别是具有终止条件的环境。我们正在使用辅助重放内存缓冲区，用于存储更多关键事务。在培训过程中，事务在第一个重放缓冲区和辅助重放缓冲区中选择。我们还使用并行环境异步执行和填充主重放缓冲区和辅助重放缓冲区。这种方法将有助于我们获得更好的性能和稳定性。最后，我们评估了我们提出的爬行模型的方法，其中一个具有稀疏奖励功能的Unity ML-Agent任务，对抗DDPG和AE-DDPG。

著录项

来源
《International Computer Conference, Computer Society of Iran》|2021年|1-6|共6页
会议地点
作者
Seyed Mohammad Seyed Motehayeri; Vahid Baghi; Ehsan Maani Miandoab; Ali Moeini;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Buffer storage; Crawlers; Memory management; Loading; Games; Reinforcement learning;

机译：培训;缓冲存储;爬行器;内存管理;装载;游戏;加强学习;

相似文献

外文文献
中文文献
专利

1. Efficient experience replay based deep deterministic policy gradient for AGC dispatch in integrated energy system [J] . Li Jiawen, Yu Tao, Zhang Xiaoshun, Applied Energy . 2021,第Mara1期

机译：基于EAGC调度的基于EAG COMPERING CONTIOMIC梯度的高效体验重播
2. Continuous shared control in prosthetic hand grasp tasks by Deep Deterministic Policy Gradient with Hindsight Experience Replay [J] . Zhaolong Gao, Rongyu Tang, Luyao Chen, International Journal of Advanced Robotic Systems . 2020,第4期

机译：通过深度确定性政策梯度与后敏感体验重放的持续共享控制掌握任务
3. Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments [J] . Zhizheng Zhang, Jiale Chen, Zhibo Chen, Cybernetics, IEEE Transactions on . 2021,第2期

机译：异步epiSodic深度确定性政策梯度：在计算复杂环境中连续控制
4. Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient [C] . Xuesong Jiang, Zhipeng Li, Xiumei Wei International conference on neural information processing;Annual conference of Asia-Pacific Neural Network Society . 2018

机译：多主体深度确定性策略梯度的异步方法
5. Enabling Program Analysis Through Deterministic Replay and Optimistic Hybrid Analysis [D] . Devecsery, David. 2018

机译：通过确定性重放和乐观混合分析实现程序分析
6. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking [O] . Chujun Liu, Andrew G. Lonsberry, Mark J. Nandor, 2019

机译：控制动态双足行走的深度确定性策略梯度的实现
7. Continuous shared control in prosthetic hand grasp tasks by Deep Deterministic Policy Gradient with Hindsight Experience Replay [O] . Zhaolong Gao, Rongyu Tang, Luyao Chen, 2020

机译：通过深度确定性政策梯度与后敏感体验重放的持续共享控制掌握任务

Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient

摘要

著录项

相似文献

相关主题

期刊订阅