首页> 外文会议>Recent advances in reinforcement learning >Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case
【24h】

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

机译:参数化模型中的有效强化学习:离散参数案例

获取原文
获取原文并翻译 | 示例

摘要

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a finite set of Markov Decision Processes (MDPs) under the discounted return criteria. We propose an on-line algorithm for learning in such parameterized models, the Parameter Elimination (PEL) algorithm, and analyze its performance in terms of the total mistake bound criterion. The algorithm relies on Wald's sequential probability ratio test to eliminate unlikely parameters, and uses an optimistic policy for effective exploration. We establish that, with high probability, the total mistake bound for the algorithm is linear (up to a logarithmic term) in the size |Θ| of the parameter space, independently of the cardinality of the state and action spaces. We further demonstrate that much better dependence on |Θ| is possible, depending on the specific information structure of the problem.
机译:我们考虑在参数化设置中进行强化学习,其中已知模型在折现收益标准下属于有限的Markov决策过程(MDP)集。我们提出一种用于在这种参数化模型中进行学习的在线算法,即参数消除(PEL)算法,并根据总错误界限准则分析其性能。该算法依靠Wald的顺序概率比检验来消除不太可能的参数,并使用乐观策略进行有效探索。我们确定,该算法的总错误界限很有可能是|Θ|大小的线性(最多对数项)。参数空间的大小,与状态和​​动作空间的基数无关。我们进一步证明,对|Θ|的依赖性更好。根据问题的具体信息结构,是否可行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号