Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

MirolliM.; SantucciV.G.; BaldassarreG.

首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

【24h】

Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

机译：多巴胺作为内在和外在补强的预测误差，会驱动动作获得和奖励最大化：模拟机器人研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. ? 2013 Elsevier Ltd.

机译：最近神经科学研究的一个重要问题是了解多巴胺在纹状体中阶段性释放的功能作用，尤其是其与强化学习的关系。文献分为两个假设：一个人将多巴胺作为一种与计算TD误差类似的奖励预测误差，其功能是引导动物最大化未来的奖励。另一种观点认为，多巴胺是一种感觉预测误差信号，它使动物发现并获得新的动作。在本文中，我们提出了一个原始假设，该假设整合了这两个相反的位置：根据我们的观点，相位多巴胺代表类似TD的增强预测误差学习信号，该信号由环境的意外变化（临时的，内在的增强）和生物奖励（永久的）决定。，外部加固）。因此，多巴胺起着驱动发现和获得新颖行为并最大化未来回报的功能作用。为了验证我们的假设，我们使用模拟机器人系统执行了一系列实验，该系统必须学习不同的技能才能获得奖励。我们比较了系统的不同版本，在这些版本中我们改变了学习信号的组成。结果表明，只有在充分复杂的条件下，由外在和内在增强共同作用的系统才能达到高性能。？ 2013爱思唯尔有限公司

著录项

来源
《Neural Networks: The Official Journal of the International Neural Network Society》 |2013年第null期|共12页
作者
MirolliM.; SantucciV.G.; BaldassarreG.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类神经病学;
关键词
Actor-critic; Computational model; Intrinsic motivation; Phasic dopamine; Reinforcement learning; TD learning;

机译：行为评论家;计算模型;内在动机;阶段性多巴胺;强化学习;TD学习;

相似文献

外文文献
中文文献
专利

1. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study [J] . MirolliM., SantucciV.G., BaldassarreG. Neural Networks: The Official Journal of the International Neural Network Society . 2013,第Null期

机译：多巴胺作为内在和外在补强的预测误差，会驱动动作获得和奖励最大化：模拟机器人研究
2. Neural response to action and reward prediction errors: Comparing the error-related negativity to behavioral errors and the feedback-related negativity to reward prediction violations [J] . Potts G.F., Martin L.E., Kamp S.-M., Psychophysiology . 2011,第2期

机译：对动作和奖励预测错误的神经反应：将错误相关的否定性与行为错误和反馈相关的否定性进行奖励预测违规比较
3. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [J] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, Frontiers in Psychology . 2014,第4期

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
4. Constrained reinforcement learning from intrinsic and extrinsic rewards [C] . Uchibe Eiji, Doya Kenji, ICDL IEEE International Conference on Development and Learning . 2007

机译：来自内在和外在奖励的受限增强学习
5. Reward Prediction Errors Shape Memory during Reinforcement Learning [D] . Rouhani, Nina. 2020

机译：奖励预测错误在加固学习期间形状内存
6. Correction for Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis [O] . 2011

机译：纠正Glimcher了解多巴胺和强化学习：多巴胺奖励预测误差假设
7. Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcement driving both action acquisition and reward maximization: A simulated robotic study [O] . Mirolli Marco, Santucci Vieri Giuliano, Baldassarre Gianluca 2013

机译：多巴胺作为内在和外在强化的预测误差，会驱动动作获取和报酬最大化：模拟机器人研究

Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

摘要

著录项

相似文献

相关主题

期刊订阅