首页>
外国专利>
NASH EQUILIBRIUM STRATEGY AND SOCIAL NETWORK CONSENSUS EVOLUTION MODEL IN CONTINUOUS ACTION SPACE
NASH EQUILIBRIUM STRATEGY AND SOCIAL NETWORK CONSENSUS EVOLUTION MODEL IN CONTINUOUS ACTION SPACE
展开▼
机译:连续行动空间中的NASH平衡策略和社交网络共识演化模型
展开▼
页面导航
摘要
著录项
相似文献
摘要
Provided in the present invention are a Nash equilibrium strategy and a social network consensus evolution model in a continuous action space, which belong to the field of reinforcement learning methods. The strategy of the present invention comprises the following steps: initializing parameters; randomly selecting an action xi according to a normal distribution N(ui,σj) according to a normal exploration rate; and performing execution, and then obtaining a return ri from the environment; if the return ri acquired by an agent i after executing an action xi is greater than a current cumulative average return Qi, the learning rate of ui is αub, and vice versa, the learning rate is αus; according to the selected learning rate, updating ui, variance σi, and Qi; and finally updating the cumulative average strategy (I); and if the cumulative average strategy (I) converges, outputting the cumulative average strategy (I) as the final action of the agent i. The present invention has the beneficial effects of: maximizing its own interests in the process of interacting with other agents, and finally learning the Nash equilibrium.
展开▼
机译:本发明提供了一种连续动作空间中的纳什均衡策略和社交网络共识进化模型,属于强化学习方法领域。本发明的策略包括以下步骤:初始化参数;根据正态探索率,根据正态分布N(u i Sub>,σ j Sub>)随机选择动作x i Sub>;执行,然后从环境中获取返回值r i Sub>;如果代理i在执行动作x i Sub>之后获得的收益r i Sub>大于当前的累积平均收益Q i Sub>,则学习u i Sub>的学习率为α ub Sub>,反之亦然,学习率为α us Sub>;根据选择的学习速率,更新u i Sub>,方差σ i Sub>和Q i Sub>;最后更新累积平均策略(I);如果累积平均策略(I)收敛,则输出累积平均策略(I)作为代理i的最终动作。本发明具有以下有益效果:在与其他代理进行交互的过程中最大化其自身的利益,并最终学习纳什均衡。
展开▼