首页> 外国专利> NASH EQUILIBRIUM STRATEGY AND SOCIAL NETWORK CONSENSUS EVOLUTION MODEL IN CONTINUOUS ACTION SPACE

NASH EQUILIBRIUM STRATEGY AND SOCIAL NETWORK CONSENSUS EVOLUTION MODEL IN CONTINUOUS ACTION SPACE

机译：连续行动空间中的NASH平衡策略和社交网络共识演化模型

页面导航

摘要
著录项
相似文献

摘要

Provided in the present invention are a Nash equilibrium strategy and a social network consensus evolution model in a continuous action space, which belong to the field of reinforcement learning methods. The strategy of the present invention comprises the following steps: initializing parameters; randomly selecting an action x_i according to a normal distribution N(u_i,σ_j) according to a normal exploration rate; and performing execution, and then obtaining a return r_i from the environment; if the return r_i acquired by an agent i after executing an action x_i is greater than a current cumulative average return Q_i, the learning rate of u_i is α_ub, and vice versa, the learning rate is α_us; according to the selected learning rate, updating u_i, variance σ_i, and Q_i; and finally updating the cumulative average strategy (I); and if the cumulative average strategy (I) converges, outputting the cumulative average strategy (I) as the final action of the agent i. The present invention has the beneficial effects of: maximizing its own interests in the process of interacting with other agents, and finally learning the Nash equilibrium.

机译：本发明提供了一种连续动作空间中的纳什均衡策略和社交网络共识进化模型，属于强化学习方法领域。本发明的策略包括以下步骤：初始化参数;根据正态探索率，根据正态分布N（u _{i ，σ_{j ）随机选择动作x _{i ;执行，然后从环境中获取返回值r _{i ;如果代理i在执行动作x _{i 之后获得的收益r _{i 大于当前的累积平均收益Q _{i ，则学习u _{i 的学习率为α_{ub ，反之亦然，学习率为α_{us ;根据选择的学习速率，更新u _{i ，方差σ_{i 和Q _{i ;最后更新累积平均策略（I）;如果累积平均策略（I）收敛，则输出累积平均策略（I）作为代理i的最终动作。本发明具有以下有益效果：在与其他代理进行交互的过程中最大化其自身的利益，并最终学习纳什均衡。}}}}}}}}}}}}}

著录项

公开/公告号WO2020024170A1

专利类型
公开/公告日2020-02-06

原文格式PDF
申请/专利权人 DONGGUAN UNIVERSITY OF TECHNOLOGY;
展开▼

申请/专利号WO2018CN98101
发明设计人 HOU HANXU;HAO JIANYE;ZHANG CHENGWEI;
展开▼

申请日2018-08-01
分类号H04L29/06;
国家 WO
入库时间 2022-08-21 11:13:40

相似文献

专利
外文文献
中文文献