Enhanced Reinforcement Learning with Targeted Dropout

机译：具有目标辍学的增强钢筋学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In modern ages, the study on Reinforcement Learning (RL) has driven on Deep Q-Network (DQN) optimization learning prediction and control of Markov decision processes (MDPs). In this paper, the researcher used the Targeted Dropout strategy for RLs DQN that makes straight into learning and would be necessary to deal with MDPs with huge or continuous state and action spaces. Every weight/unit update, the targeted dropout selects a set of elements and to keep only the weights/units of maximum amount, and then apply dropout to the set. It has also a common pruning strategy so focus on fast approximations, such as removing weights with the smallest value or ranking the weights/units by the sensitivity of the network design and even rating by the sensitivity of the task execution with respect to the weights/units and removing the least-sensitive ones. The result shows that the proposed algorithm for enhancing the RL's DQN is more accurate in finding the best action to learn to achieve maximum reward. The simulation presents that in a minimal run of episodes it can achieve the maximum average reward, while without Targeted Dropout it takes more runs to achieve the average reward, and throughout the assessment of the algorithm, the suggested algorithm acquires more learning in finding the large reward value.

机译：在现代衰老中，对强化学习（RL）的研究在深度Q-Network（DQN）优化学习预测和Markov决策过程中的控制（MDP）。在本文中，研究人员使用了RLS DQN的有针对性的辍学策略，这是直接学习的，并且有必要处理具有巨大或连续状态和行动空间的MDP。每个权重/单位更新，目标丢失选择一组元素并只保留最大金额的权重/单位，然后将丢弃器应用于集合。它还具有共同的修剪策略，因此侧重于快速近似，例如通过网络设计的灵敏度甚至通过对权重的任务执行的灵敏度来删除具有最小值的权重或排序权重/单位的权重。单位并删除最不敏感的。结果表明，提高RL的DQN的算法更准确地找到学习最大奖励的最佳动作。仿真显示，在最小的剧集中，它可以实现最大的平均奖励，而在没有针对性的辍学的情况下，它需要更多运行来实现平均奖励，并且在整个算法的评估中，所建议的算法在找到大的学习时获得更多的学习奖励价值。

著录项

来源
《International Conference on Digitization》|2019年|1 v.|共5页
会议地点
作者
Mark Jovic A. Daday; Kristoffer Franz Mari R. Millado;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Artificial Intelligence; Machine Learning; Reinforcement Learning; Dropout Approximation; Markov Decision Processes (MDP); Deep Q-Network (DQN); Targeted Dropout;

机译：人工智能;机器学习;加固学习;辍学近似;马尔可夫决策过程（MDP）;Deep Q-Network（DQN）;有针对性的辍学;

相似文献

外文文献
中文文献
专利

1. Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network [J] . Poernomo Alvin, Kang Dae-Ki Neural Networks: The Official Journal of the International Neural Network Society . 2018,第期

机译：偏见辍学和交叉图辍学：在卷积神经网络中了解有效的辍学正规化
2. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning [J] . Naoto Horie, Tohgoroh Matsui, Koichi Moriyama, Artificial life and robotics . 2019,第3期

机译：多目标安全强化学习：多目标强化学习与安全强化学习之间的关系
3. Enhanced model-free adaptive iterative learning control with load disturbance and data dropout [J] . Hua Changchun, Qiu Yunfei, Guan Xinping International journal of systems science . 2020,第9a12期

机译：具有负载干扰和数据丢失的增强的无模型自适应迭代学习控制
4. Enhanced Reinforcement Learning with Targeted Dropout [C] . Mark Jovic A. Daday, Kristoffer Franz Mari R. Millado International Conference on Digitization . 2019

机译：有针对性的辍学增强强化学习
5. The Effects of Sensor Performance as Modeled by Signal Detection Theory on the Performance of Reinforcement Learning in a Target Acquisition Task. [D] . Quirion, Nate. 2013

机译：通过信号检测理论建模的传感器性能对目标获取任务中强化学习性能的影响。
6. Unsupervised Learning and Clustered Connectivity Enhance Reinforcement Learning in Spiking Neural Networks [O] . Philipp Weidel, Renato Duarte, Abigail Morrison 2021

机译：无监督的学习和集群连接在尖峰神经网络中加强钢筋学习
7. Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning [O] . Woojun Kim, Myungsik Cho, Youngchul Sung 2019

机译：消息 - 丢失：多智能经纪深度加强学习的高效培训方法

Enhanced Reinforcement Learning with Targeted Dropout

摘要

著录项

相似文献

相关主题

期刊订阅