首页> 美国卫生研究院文献>PLoS Computational Biology >Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin
【2h】

Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin

机译:强化学习解释条件合作及其喜怒无常的表弟

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Direct reciprocity, or repeated interaction, is a main mechanism to sustain cooperation under social dilemmas involving two individuals. For larger groups and networks, which are probably more relevant to understanding and engineering our society, experiments employing repeated multiplayer social dilemma games have suggested that humans often show conditional cooperation behavior and its moody variant. Mechanisms underlying these behaviors largely remain unclear. Here we provide a proximate account for this behavior by showing that individuals adopting a type of reinforcement learning, called aspiration learning, phenomenologically behave as conditional cooperator. By definition, individuals are satisfied if and only if the obtained payoff is larger than a fixed aspiration level. They reinforce actions that have resulted in satisfactory outcomes and anti-reinforce those yielding unsatisfactory outcomes. The results obtained in the present study are general in that they explain extant experimental results obtained for both so-called moody and non-moody conditional cooperation, prisoner’s dilemma and public goods games, and well-mixed groups and networks. Different from the previous theory, individuals are assumed to have no access to information about what other individuals are doing such that they cannot explicitly use conditional cooperation rules. In this sense, myopic aspiration learning in which the unconditional propensity of cooperation is modulated in every discrete time step explains conditional behavior of humans. Aspiration learners showing (moody) conditional cooperation obeyed a noisy GRIM-like strategy. This is different from the Pavlov, a reinforcement learning strategy promoting mutual cooperation in two-player situations.
机译:直接互惠或反复互动是在涉及两个人的社会困境下维持合作的主要机制。对于可能与理解和改造我们的社会更相关的更大的团体和网络,使用重复的多人社交困境游戏的实验表明,人类经常表现出条件合作行为及其喜怒无常的变化。这些行为的潜在机制在很大程度上尚不清楚。在这里,我们通过显示采用强化学习(称为志向学习)的个人在现象学上表现为条件合作者的行为,为这种行为提供了近似的解释。根据定义,当且仅当所获得的收益大于固定的期望水平时,个人才会感到满意。它们加强了导致令人满意的结果的行动,反加强了产生不令人满意的结果的行动。在本研究中获得的结果是一般性的,它们解释了现有的实验结果,这些实验结果是针对所谓的喜怒无常的条件合作,囚徒困境和公益游戏以及混合的群体和网络而获得的。与先前的理论不同,假定个体无法访问有关其他个体正在做的事情的信息,因此他们不能明确使用条件合作规则。从这个意义上讲,在每个离散的时间步长中调节无条件合作倾向的近视愿望学习可以解释人的有条件行为。表现出(有情绪的)条件合作的有志学习者遵循了嘈杂的GRIM样策略。这与巴甫洛夫(Pavlov)不同,后者是一种加强学习策略,可促进两人游戏中的相互合作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号