Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Takada Kei; Iizuka Hiroyuki; Yamamoto Masahito

首页> 外文期刊>IEEE Transactions on Games >Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

【24h】

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

机译：使用Minimax树搜索在十六进制中创建价值和策略函数的加强学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go, Chess, and Shogi. In previous studies, the policy function was trained to predict the search probabilities of each move output by Monte Carlo tree search; thus, a number of simulations were required to obtain the search probabilities. We propose a reinforcement-learning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex, a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on $13 imes 13$ board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.

机译：最近，已经提出了使用增强学习算法来创造价值和政策功能，并且使用GO，国际象棋和Shogi来证明其有效性。在以前的研究中，训练策略函数以预测Monte Carlo树搜索每个移动输出的搜索概率;因此，需要许多模拟来获得搜索概率。我们提出了一种利用自助游戏的加强学习算法来创建价值和策略功能，使得策略函数直接从游戏结果培训，而不进行搜索概率。在这项研究中，我们使用Hex，由Piet Hein开发的棋盘游戏，评估提出的方法。我们在策略功能准确性方面展示了所提出的学习算法的有效性，并使用所提出的计算机十六进制算法Deepezo和2017年世界冠军计划扮演锦标赛。锦标赛结果表明Deepezo优于所有程序。 Deepezo在与13美元的同一搜索条件下，对世界冠军计划Mohex2.0的胜利百分比为79.3％。我们还表明，可以通过培训策略功能来创建高度准确的策略功能，以增加在失败者位置中搜索的移动次数。

著录项

来源
《IEEE Transactions on Games》 |2020年第1期|63-73|共11页
作者
Takada Kei; Iizuka Hiroyuki; Yamamoto Masahito;
展开▼
作者单位

Hokkaido Univ Grad Sch Informat Sci & Technol Sapporo Hokkaido 0600814 Japan;

Hokkaido Univ Grad Sch Informat Sci & Technol Sapporo Hokkaido 0600814 Japan;

Hokkaido Univ Grad Sch Informat Sci & Technol Sapporo Hokkaido 0600814 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hex; policy function; reinforcement learning; value function;

机译：十六进制;政策功能;强化学习;价值函数;

相似文献

外文文献
中文文献
专利

1. Search methods for optimising reinforcement learning policy functions [J] . Salah Aziz Rana, Malcolm Crowe, Colin Fyfe Computing and Information Systems . 2010,第3期

机译：优化强化学习策略功能的搜索方法
2. Continuous-action reinforcement learning with fast policy search and adaptive basis function selection [J] . Xin Xu, Chunming Liu, Dewen Hu Soft Computing - A Fusion of Foundations, Methodologies and Applications . 2011,第6期

机译：具有快速策略搜索和自适应基函数选择的连续动作强化学习
3. Continuous-action reinforcement learning with fast policy search and adaptive basis function selection [J] . Xu X., Liu C., Hu D. Soft computing: A fusion of foundations, methodologies and applications . 2011,第6期

机译：具有快速策略搜索和自适应基函数选择的连续动作强化学习
4. Reinforcement Learning for Creating Evaluation Function Using Convolutional Neural Network in Hex [C] . Kei Takada, Hiroyuki Iizuka, Masahito Yamamoto Conference on Technologies and Applications of Artificial Intelligence . 2017

机译：使用十六进制卷积神经网络创建评估函数的强化学习
5. A Study on Learning Algorithms of Value and Policy Functions in Hex [D] . Takada, Kei 2019

机译：十六进制值与策略函数学习算法的研究
6. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning [O] . Xiaoxue Wang, Yujie Qian, Hanyu Gao, 2020

机译：朝着蒙特卡罗树搜索和加固学习有效发现绿色综合途径
7. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient [O] . Shihui Li, Yi Wu, Xinyue Cui, 2019

机译：稳健的多功能钢筋通过Minimax深度确定性政策梯度学习

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

摘要

著录项

相似文献

相关主题

期刊订阅