首页> 外文会议>Annual neural information processing systems conference >Timing and Partial Observability in the Dopamine System
【24h】

Timing and Partial Observability in the Dopamine System

机译:多巴胺系统中的定时和部分可观察性

获取原文

摘要

According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.
机译:根据一系列有影响力的模型,多巴胺(DA)神经元信号奖励使用时间差(TD)算法的预测误差。我们解决了在这些账户中没有令人信服的问题:如何维持预测延迟后果的提示的代表。我们的新模型使用部分可观察到的半马尔可夫过程接地的TD规则,一种形式主义,捕获DA实验的两个很大忽略的特征:隐藏状态和时间变异性。以前的模型使用感觉输入的触发延迟线表示预测奖励;我们用更积极的推断取代这一点,了解世界潜在的国家。然后,DA系统可以学习映射这些推断状态以使用TD奖励预测。新模型可以在面对时间变异性上解释关于DA神经元的响应的烦恼数据。通过将基于统计模型的学习与生理接地的TD理论结合起来,它也会与生理学接触,一些关于以前被限制在更抽象的心理模型的行为的一些见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号