Convergence of a Q-learning Variant for Continuous States and Actions

Stephen Carden

首页> 外文期刊>The Journal of Artificial Intelligence Research >Convergence of a Q-learning Variant for Continuous States and Actions

【24h】

Convergence of a Q-learning Variant for Continuous States and Actions

机译：连续状态和动作的Q学习变量的收敛性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.

机译：本文提出了一种强化学习算法，用于在状态空间和动作空间都连续的情况下，在预期总折现奖励准则下求解无限地平线马尔可夫决策过程。该算法基于Watkins的Q学习，但使用Nadaraya-Watson核平滑技术将知识泛化为未访问状态。如预期的那样，必须对平均奖励和过渡概率强加连续性条件。利用核回归理论的结果，证明该算法能够产生Q值函数估计值，该估计值始终在概率为1的真实Q值函数的任意公差范围内。然后将该算法应用于示例问题，以根据经验显示收敛。

著录项

来源
《The Journal of Artificial Intelligence Research》 |2014年第2014期|共27页
作者
Stephen Carden;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Convergence of a Q-learning Variant for Continuous States and Actions [J] . Carden S. W. The Journal of Artificial Intelligence Research . 2014,第4期

机译：连续状态和动作的Q学习变量的收敛性
2. Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network [J] . Takaaki Kobayashi, Takeshi Shibuya, Masahiko Morita Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2015,第6a113期

机译：通过使用选择性脱敏神经网络在具有噪声和冗余输入的连续状态-动作空间中进行Q学习
3. CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION [J] . Kao-Shing Hwang, Yu-Jen Chen, Wei-Cheng Jiang, Asian Journal of Control . 2013,第4期

机译：多智能体合作中Q学习的连续作用生成
4. Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants [C] . Jits Schilperoort, Ivar Mak, Madalina M. Drugan, IEEE Symposium Series on Computational Intelligence . 2018

机译：通过Q-Learning和两个双Q-Learning变体学习玩Pac-Xon
5. Studies of Monte Carlo Methodology for Assessing Convergence, Incorporating Decision Making, and Manipulating Continuous Variables [D] . Zelinsky, Nicole Alexandra. 2020

机译：蒙特卡罗方法论评估收敛，决策和操纵连续变量的研究
6. Transcriptional Interaction of an Estrogen Receptor Splice Variant and ErbB4 Suggests Convergence in Gene Susceptibility Pathways in Schizophrenia [O] . Jenny Wong, Cynthia Shannon Weickert 2009

机译：雌激素受体剪接变体和ErbB4的转录相互作用表明精神分裂症的基因易感性途径的收敛。
7. Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants [O] . Jits Schilperoort, Ivar Mak, Madalina M. Drugan, 2018

机译：学习用Q-Learning和两个双Q学习变体玩Pac-Xon

Convergence of a Q-learning Variant for Continuous States and Actions

摘要

著录项

相似文献

相关主题

期刊订阅