首页> 外文期刊>The Journal of Artificial Intelligence Research >Convergence of a Q-learning Variant for Continuous States and Actions
【24h】

Convergence of a Q-learning Variant for Continuous States and Actions

机译:连续状态和动作的Q学习变量的收敛性

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.
机译:本文提出了一种强化学习算法,用于在状态空间和动作空间都连续的情况下,在预期总折现奖励准则下求解无限地平线马尔可夫决策过程。该算法基于Watkins的Q学习,但使用Nadaraya-Watson核平滑技术将知识泛化为未访问状态。如预期的那样,必须对平均奖励和过渡概率强加连续性条件。利用核回归理论的结果,证明该算法能够产生Q值函数估计值,该估计值始终在概率为1的真实Q值函数的任意公差范围内。然后将该算法应用于示例问题,以根据经验显示收敛。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号