Prospective And Retrospective Temporal Difference Learning

PETER DAYAN

首页> 外文期刊>Network >Prospective And Retrospective Temporal Difference Learning

【24h】

Prospective And Retrospective Temporal Difference Learning

机译：前瞻性和回顾性时间差异学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A striking recent finding is that monkeys behave maladaptively in a class of tasks in which they know that reward is going to be systematically delayed. This may be explained by a malign Pavlovian influence arising from states with low predicted values. However, by very carefully analyzing behavioral data from such tasks, La Camera and Richmond (2008) observed the additional important characteristic that subjects perform differently on states in the task that are at equal distances from the future reward, depending on what has happened in the recent past. The authors pointed out that this violates the definition of state value in the standard reinforcement learning models that are ubiquitous as accounts of operant and classical conditioned behavior; they suggested and analyzed an alternative temporal difference (TD) model in which past and future are melded. Here, we show that, in fact, a standard TD model can actually exhibit the same behavior, and that this avoids deleterious consequences for choice. At the heart of the model is the average reward per step, which acts as a baseline for measuring immediate rewards. Relatively subtle changes to this baseline occasioned by the past can markedly influence predictions and thus behavior.

机译：最近的一个惊人发现是，猴子在某类任务中表现出不良适应性，他们知道奖励将被系统地延迟。这可能是由于预测值较低的状态引起的恶性的巴甫洛夫影响。但是，通过非常仔细地分析来自此类任务的行为数据，La Camera and Richmond（2008）观察到了另一个重要特征，即受试者在任务中与未来奖励距离相等的状态下的表现不同，具体取决于任务中发生的情况。最近的过去。作者指出，这违反了标准强化学习模型中状态值的定义，该模型普遍存在于操作和经典条件行为的解释；他们建议并分析了将过去和未来融为一体的替代时差（TD）模型。在这里，我们表明，事实上，标准TD模型实际上可以表现出相同的行为，并且可以避免选择时的有害后果。该模型的核心是每步的平均奖励，它是衡量即时奖励的基准。过去引起的对该基准的相对微妙的变化会明显影响预测并因此影响行为。

著录项

来源
《Network》 |2009年第1期|32-46|共15页
作者
PETER DAYAN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
emotional processing; reinforcement learning;

机译：情绪处理;强化学习;

相似文献

外文文献
中文文献
专利

1. The potential of temporal analysis: Combining log data and lag sequential analysis to investigate temporal differences between scaffolded and non-scaffolded group inquiry-based learning processes [J] . Lamsa Joni, Hamalainen Raija, Koskinen Pekka, Computers & education . 2020,第Jana期

机译：时间分析的潜力：将日志数据和滞后顺序分析相结合，以研究基于支架和非支架的小组探究式学习过程之间的时间差异
2. Temporal differences in gamma-hydroxybutyrate overdoses involving injecting drug users versus recreational drug users in Helsinki: a retrospective study [J] . James J Boyd, Markku J Kuisma, Tarja T Randell Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine . 2012,第1期

机译：一项回顾性研究：在赫尔辛基，涉及注射吸毒者与娱乐性吸毒者的γ-羟丁酸酯超剂量的时间差异
3. Age Differences in Cue Utilization During Prospective and Retrospective Memory Monitoring [J] . Gallant Sara N., Spaniol Julia, Yang Lixia Psychology and aging . 2019,第4期

机译：在预期和回顾性记忆监测期间提示利用年龄差异
4. Distributed Hybrid Kalman Temporal Differences for Reinforcement Learning [C] . Mohammad Salimibeni, Parvin Malekzadeh, Arash Mohammadi, Asilomar Conference on Signals, Systems, and Computers . 2020

机译：分布式混合卡尔曼加固学习的时间差异
5. Spatial and temporal analyses of anthrax: An exploratory retrospective and prospective examination of outbreaks in Kazakhstan. [D] . Kracalik, Ian T. 2009

机译：炭疽的时空分析：对哈萨克斯坦疫情的探索性回顾性和前瞻性检查。
6. Temporal trends and differences in mortality at trauma centres across Ontario from 2005 to 2011: a retrospective cohort study [O] . David Gomez, Aziz S. Alali, Barbara Haas, 2014

机译：回顾性队列研究：2005年至2011年安大略省创伤中心的时间趋势和死亡率差异
7. Prospective and retrospective temporal difference learning [O] . Peter Dayan 2009

机译：前瞻性和回顾性时间差异学习

Prospective And Retrospective Temporal Difference Learning

摘要

著录项

相似文献

相关主题

期刊订阅