...
首页> 外文期刊>Frontiers of computer science >A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games
【24h】

A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games

机译:一个蒙特卡罗神经虚拟自助式自助方法,以近期信息动态游戏近似纳什均衡

获取原文
获取原文并翻译 | 示例
           

摘要

Solving the optimization problem to approach a Nash Equilibrium point plays an important role in imperfect information games, e.g., StarCraft and poker. Neural Fictitious Self-Play (NFSP) is an effective algorithm that learns approximate Nash Equilibrium of imperfect-information games from purely self-play without prior domain knowledge. However, it needs to train a neural network in an off-policy manner to approximate the action values. For games with large search spaces, the training may suffer from unnecessary exploration and sometimes fails to converge. In this paper, we propose a new Neural Fictitious Self-Play algorithm that combines Monte Carlo tree search with NFSP, called MC-NFSP, to improve the performance in real-time zero-sum imperfect-information games. With experiments and empirical analysis, we demonstrate that the proposed MC-NFSP algorithm can approximate Nash Equilibrium in games with large-scale search depth while the NFSP can not. Furthermore, we develop an Asynchronous Neural Fictitious Self-Play framework (ANFSP). It uses asynchronous and parallel architecture to collect game experience and improve both the training efficiency and policy quality. The experiments with th e games with hidden state information (Texas Hold'em), and the FPS (firstperson shooter) games demonstrate effectiveness of our algorithms.
机译:解决纳什均衡点的优化问题在不完美的信息游戏中发挥着重要作用,例如星际争霸和扑克。神经虚拟自主游戏(NFSP)是一种有效的算法,它从纯粹的自我播放没有先前域知识的纯粹自我播放学习近似NASH均衡。但是,它需要以违规方式训练神经网络以近似行动值。对于具有大型搜索空间的游戏,培训可能遭受不必要的探索,有时无法收敛。在本文中,我们提出了一种新的神经虚拟自助式算法,将Monte Carlo树搜索与NFSP,称为MC-NFSP,改善实时零和不完全信息游戏的性能。通过实验和经验分析,我们证明所提出的MC-NFSP算法可以在NFSP不能的情况下在具有大规模搜索深度的游戏中近似纳什均衡。此外,我们开发了一个异步神经虚拟自主游戏框架(ANFSP)。它使用异步和并行架构来收集游戏体验并提高培训效率和政策质量。通过隐藏状态信息(德州HOLD'EM)和FPS(先生射击)游戏的实验,证明了我们算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号