Faster Near-Potimal Reinforcement Learning: Adding Adaptiveness to the E~3 Algorithm

机译：更快的近乎最佳的强化学习：为E〜3算法增加适应性

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of hte underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorihtm might perform better in practice than the previous one.

机译：最近，Kearns和Singh提出了第一个可证明有效的，接近最优的算法，用于一般Markov决策过程中的强化学习。该算法的主要贡献之一是其对勘探开发权衡的明确处理。在本文中，我们展示了如何通过替换探索阶段来改进算法，通过更适合该问题的自适应采样方法，通过估计过渡概率，建立基础马尔可夫决策过程的模型。我们的改进有两个方面。首先，我们在收敛到几乎最优政策所需的最坏情况下的时间的理论界限明显较小。其次，由于我们使用的采样方法的适应性，我们讨论了算法在实践中如何比前一种方法表现更好。

著录项

来源
《10th International Conference on Algorithmic Learning Theory ALT'99 Tokyo, Japan, December 6-8, 1999》|1999年|p.241-251|共11页
会议地点 Tokyo(JP);Tokyo(JP)
作者
Carlos Domingo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类数学;
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive Fault-Tolerant Tracking Control for MIMO Discrete-Time Systems via Reinforcement Learning Algorithm With Less Learning Parameters [J] . Lei Liu, Zhanshan Wang, Huaguang Zhang Automation Science and Engineering, IEEE Transactions on . 2017,第1期

机译：通过具有较少学习参数的强化学习算法对MIMO离散时间系统进行自适应容错跟踪控制
2. Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [J] . Vikram Krishnamurthy, George Yin Journal of machine learning research . 2021,第a期

机译：随机梯度算法自适应逆加固学习的Langevin动态
3. Application of Reinforcement Learning Algorithms for the Adaptive Computation of the Smoothing Parameter for Probabilistic Neural Network [J] . Kusy Maciej, Zajdel Roman Neural Networks and Learning Systems, IEEE Transactions on . 2015,第9期

机译：强化学习算法在概率神经网络平滑参数自适应计算中的应用
4. Faster Near-Potimal Reinforcement Learning: Adding Adaptiveness to the E~3 Algorithm [C] . Carlos Domingo International conference on algorithmic learning theory . 1999

机译：更快的近最佳加强学习：为E〜3算法添加适应性
5. Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains. [D] . Gendron-Bellemare, Marc. 2013

机译：高维领域中增强学习的快速，可扩展算法。
6. Deep Learning: Individual Maize Segmentation From Terrestrial Lidar Data Using Faster R-CNN and Regional Growth Algorithms [O] . Shichao Jin, Yanjun Su, Shang Gao, -1

机译：深度学习：使用更快的R-CNN和区域生长算法从陆地激光雷达数据中对玉米进行单独分割
7. Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm [O] . Carlos Domingo 1999

机译：更快的近乎最佳的强化学习：为E³算法增加适应性

Faster Near-Potimal Reinforcement Learning: Adding Adaptiveness to the E~3 Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅