Bias-Variance Error Bounds for Temporal Difference Updates

机译：时差更新的偏差-偏差误差界

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Temporal difference (TD) algorithms are used in reinforcement learning to compute estimates of the value of a given policy in an unknown Markov decision process (policy evaluation). We give rigorous upper bounds on the error of the closely related phased TD algorithms (which differ from the standard updates in their treatment of the learning rate) as a function of the amount of experience. These upper bounds prove exponentially fast convergence, with both the rate of convergence and the asymptote strongly dependent on the length of the backups k or the parameterλ. Our bounds give formal verification to the well-known intuition that TD methods are subject to a bias-variance tradeoff, and they lead to schedules for k andλ that are predicted to be better than any fixed values for these parameters. We give preliminary experimental confirmation of our theory for a version of the random walk problem.

机译：时间差异（TD）算法用于强化学习中，以在未知的马尔可夫决策过程（策略评估）中计算给定策略的价值估计。我们根据经验量对严格相关的分阶段TD算法（在学习率的处理上不同于标准更新）的误差给出严格的上限。这些上限证明了指数级快速收敛，收敛速度和渐近线都强烈取决于备份k的长度或参数λ。我们的界限正式验证了TD方法要经受偏差方差折衷的直觉，并且它们导致k和λ的调度比这些参数的任何固定值都要好。我们对随机游走问题的版本给出了理论的初步实验确认。

著录项

来源
《Thirteenth Annual Conference on Computational Learning Theory, Jun 28-Jul 1, 2000, Palo Alto, California》|2000年|p.142-147|共6页
会议地点 Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US);Palo Alto CA(US)
作者
Michael Kearns; Satinder Singh;
展开▼
作者单位

ATT Labs 180 Park Avenue, Room A235 Florham Park, NJ 07932;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Temporal Uncertainty and Temporal Estimation Errors Affect Insular Activity and the Frontostriatal Indirect Pathway during Action Update: A Predictive Coding Study [J] . Roberto Limongi, Francisco J. Pérez, Cristián Modro?o, Frontiers in Human Neuroscience . 2016,第12期

机译：时间不确定性和时间估计误差影响动作更新过程中的岛活动和额窦间接通路：预测编码研究。
2. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors [J] . Nature neuroscience . 2020,第2期

机译：支持该提议的因果证据，即多巴胺瞬变函数作为时间差预测误差
3. Temporal-difference prediction errors and Pavlovian fear conditioning: Role of NMDA and opioid receptors [J] . Cole S, McNally GP Behavioral neuroscience . 2007,第5期

机译：时差预测误差和巴甫洛夫恐惧条件：NMDA和阿片受体的作用
4. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis [C] . Assaf Hallak, Aviv Tamar, Remi Munos, AAAI Conference on Artificial Intelligence . 2016

机译：广义强调时间差异学习：偏差方差分析
5. Bias-variance Error Decomposition for Data-driven Geospatial Modeling. [D] . Gao, Jing. 2013

机译：用于数据驱动的地理空间建模的偏差方差误差分解。
6. Temporal Uncertainty and Temporal Estimation Errors Affect Insular Activity and the Frontostriatal Indirect Pathway during Action Update: A Predictive Coding Study [O] . Roberto Limongi, Francisco J. Pérez, Cristián Modroño, 2016

机译：时间不确定性和时间估计误差影响行动更新过程中的岛活动和额窦间接通路：预测性编码研究。
7. Temporal Uncertainty and Temporal Estimation Errors Affect Insular Activity and the Frontostriatal Indirect Pathway during Action Update: A Predictive Coding Study [O] . Roberto Limongi, Francisco J. Pérez, Cristián Modroño, 2016

机译：在行动更新期间影响岛屿活动和前纹状体间接途径的时间不确定性和时间估计误差：预测编码研究
8. Cramer-Rao Bound, MUSIC, and Maximum Likelihood. Effects of Temporal Phase Difference. [R] . Tran, C. V. 1990

机译：Cramer-Rao Bound，mUsIC和maximum Likelihood。时间相位差的影响。

Bias-Variance Error Bounds for Temporal Difference Updates

摘要

著录项

相似文献

相关主题

期刊订阅