首页> 外文会议>10th International Conference on Algorithmic Learning Theory ALT'99 Tokyo, Japan, December 6-8, 1999 >Faster Near-Potimal Reinforcement Learning: Adding Adaptiveness to the E~3 Algorithm
【24h】

Faster Near-Potimal Reinforcement Learning: Adding Adaptiveness to the E~3 Algorithm

机译:更快的近乎最佳的强化学习:为E〜3算法增加适应性

获取原文
获取原文并翻译 | 示例

摘要

Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of hte underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorihtm might perform better in practice than the previous one.
机译:最近,Kearns和Singh提出了第一个可证明有效的,接近最优的算法,用于一般Markov决策过程中的强化学习。该算法的主要贡献之一是其对勘探开发权衡的明确处理。在本文中,我们展示了如何通过替换探索阶段来改进算法,通过更适合该问题的自适应采样方法,通过估计过渡概率,建立基础马尔可夫决策过程的模型。我们的改进有两个方面。首先,我们在收敛到几乎最优政策所需的最坏情况下的时间的理论界限明显较小。其次,由于我们使用的采样方法的适应性,我们讨论了算法在实践中如何比前一种方法表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号