...
首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study
【24h】

Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study

机译:多巴胺作为内在和外在补强的预测误差,会驱动动作获得和奖励最大化:模拟机器人研究

获取原文
获取原文并翻译 | 示例
           

摘要

An important issue of recent neuroscientific research is to understand the functional role of the phasic release of dopamine in the striatum, and in particular its relation to reinforcement learning. The literature is split between two alternative hypotheses: one considers phasic dopamine as a reward prediction error similar to the computational TD-error, whose function is to guide an animal to maximize future rewards; the other holds that phasic dopamine is a sensory prediction error signal that lets the animal discover and acquire novel actions. In this paper we propose an original hypothesis that integrates these two contrasting positions: according to our view phasic dopamine represents a TD-like reinforcement prediction error learning signal determined by both unexpected changes in the environment (temporary, intrinsic reinforcements) and biological rewards (permanent, extrinsic reinforcements). Accordingly, dopamine plays the functional role of driving both the discovery and acquisition of novel actions and the maximization of future rewards. To validate our hypothesis we perform a series of experiments with a simulated robotic system that has to learn different skills in order to get rewards. We compare different versions of the system in which we vary the composition of the learning signal. The results show that only the system reinforced by both extrinsic and intrinsic reinforcements is able to reach high performance in sufficiently complex conditions. ? 2013 Elsevier Ltd.
机译:最近神经科学研究的一个重要问题是了解多巴胺在纹状体中阶段性释放的功能作用,尤其是其与强化学习的关系。文献分为两个假设:一个人将多巴胺作为一种与计算TD误差类似的奖励预测误差,其功能是引导动物最大化未来的奖励。另一种观点认为,多巴胺是一种感觉预测误差信号,它使动物发现并获得新的动作。在本文中,我们提出了一个原始假设,该假设整合了这两个相反的位置:根据我们的观点,相位多巴胺代表类似TD的增强预测误差学习信号,该信号由环境的意外变化(临时的,内在的增强)和生物奖励(永久的)决定。 ,外部加固)。因此,多巴胺起着驱动发现和获得新颖行为并最大化未来回报的功能作用。为了验证我们的假设,我们使用模拟机器人系统执行了一系列实验,该系统必须学习不同的技能才能获得奖励。我们比较了系统的不同版本,在这些版本中我们改变了学习信号的组成。结果表明,只有在充分复杂的条件下,由外在和内在增强共同作用的系统才能达到高性能。 ? 2013爱思唯尔有限公司

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号