...
首页> 外文期刊>IEEE Transactions on Games >Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex
【24h】

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

机译:使用Minimax树搜索在十六进制中创建价值和策略函数的加强学习

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go, Chess, and Shogi. In previous studies, the policy function was trained to predict the search probabilities of each move output by Monte Carlo tree search; thus, a number of simulations were required to obtain the search probabilities. We propose a reinforcement-learning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex, a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on $13 imes 13$ board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.
机译:最近,已经提出了使用增强学习算法来创造价值和政策功能,并且使用GO,国际象棋和Shogi来证明其有效性。在以前的研究中,训练策略函数以预测Monte Carlo树搜索每个移动输出的搜索概率;因此,需要许多模拟来获得搜索概率。我们提出了一种利用自助游戏的加强学习算法来创建价值和策略功能,使得策略函数直接从游戏结果培训,而不进行搜索概率。在这项研究中,我们使用Hex,由Piet Hein开发的棋盘游戏,评估提出的方法。我们在策略功能准确性方面展示了所提出的学习算法的有效性,并使用所提出的计算机十六进制算法Deepezo和2017年世界冠军计划扮演锦标赛。锦标赛结果表明Deepezo优于所有程序。 Deepezo在与13美元的同一搜索条件下,对世界冠军计划Mohex2.0的胜利百分比为79.3%。我们还表明,可以通过培训策略功能来创建高度准确的策略功能,以增加在失败者位置中搜索的移动次数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号