【24h】

Combining Exploitation-Based and Exploration-Based Approach in Reinforcement Learning

机译:在强化学习中结合基于开发和基于探索的方法

获取原文
获取原文并翻译 | 示例

摘要

Watkins' Q-learning is the most popular and an effective model-free method. However, comparing model-based approach, Q-learning with various exploration strategies require a large number of trial-and-error interactions for finding an optimal policy. To overcome this drawback, we propose a new model-based learning method extending Q-learning. This method has separated EI and ER functions for learning exploitation-based and exploration-based model, respectively. EI function based on statistics indicates the best action. The another ER function based on the information of exploration leads the learner to well-unknown region in the global state space by backing up in each step. Then, we introduce a new criterion as the information of exploration. Using combined these function, we can effectively proceed exploitation and exploration strategies and can select an action which considers each strategy simultaneously.
机译:沃特金斯(Watkins)的Q学习是最流行且最有效的无模型方法。但是,与基于模型的方法相比,具有各种探索策略的Q学习需要大量的反复试验才能找到最佳策略。为了克服这个缺点,我们提出了一种新的基于模型的扩展Q学习的学习方法。该方法将EI和ER功能分开,分别用于学习基于开发的模型和基于探索的模型。基于统计的EI功能指示最佳操作。基于探索信息的另一个ER功能通过在每个步骤中进行备份,将学习者引导到全局状态空间中的未知区域。然后,我们引入了一个新的准则作为探索的信息。结合使用这些功能,我们可以有效地进行开发和勘探策略,并可以选择同时考虑每种策略的动作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号