首页> 外文会议>Learning and intelligent optimization >Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search
【24h】

Consistency Modifications for Automatically Tuned Monte-Carlo Tree Search

机译:自动调整的蒙特卡洛树搜索的一致性修改

获取原文
获取原文并翻译 | 示例

摘要

Monte-Carlo Tree Search algorithms (MCTS [4,6]), including upper confidence trees (UCT [9]), are known for their impressive ability in high dimensional control problems. Whilst the main testbed is the game of Go, there are increasingly many applications [13,12,7]; these algorithms are now widely accepted as strong candidates for high-dimensional control applications. Unfortunately, it is known that for optimal performance on a given problem, MCTS requires some tuning; this tuning is often handcrafted or automated, with in some cases a loss of consistency, i. e. a bad behavior asymptotically in the computational power. This highly undesirable property led to a stupid behavior of our main MCTS program MoGo in a real-world situation described in section 3. This is a big trouble for our several works on automatic parameter tuning [3] and the genetic programming of new features in MoGo. We will see in this paper: - A theoretical analysis of MCTS consistency; - Detailed examples of consistent and inconsistent known algorithms; - How to modify a MCTS implementation in order to ensure consistency, independently of the modifications to the "scoring" module (the module which is automatically tuned and genetically programmed in MoGo); - As a by product of this work, we'll see the interesting property that some heavily tuned MCTS implementations are better than UCT in the sense that they do not visit the complete tree (whereas UCT asymptotically does), whilst preserving the consistency at least if "consistency" modifications above have been made.
机译:蒙特卡洛树搜索算法(MCTS [4,6]),包括高置信度树(UCT [9]),以其在高维控制问题中的出色能力而闻名。尽管主要的测试平台是围棋游戏,但越来越多的应用程序[13,12,7]。这些算法现在已被广泛接受为高维控制应用程序的强大候选者。不幸的是,众所周知,为了在给定问题上获得最佳性能,MCTS需要进行一些调整。这种调整通常是手工制作或自动化的,在某些情况下会失去一致性,即e。渐进地在计算能力上的不良行为。这种高度不良的特性导致我们的主要MCTS程序MoGo在第3节中描述的真实情况下的愚蠢行为。这对于我们在自动参数调整[3]和遗传算法中新功能的遗传编程的几项工作造成了很大麻烦MoGo。我们将在本文中看到:-MCTS一致性的理论分析; -一致和不一致的已知算法的详细示例; -如何修改MCTS实施以确保一致性,而与“评分”模块(在MoGo中自动调整和基因编程的模块)的修改无关。 -作为这项工作的副产品,我们将看到一些有趣的属性,即某些经过大量调优的MCTS实现不访问完整的树(而UCT渐近地进行)则比UCT更好,同时至少保留了一致性如果已对上面的“一致性”进行了修改。

著录项

  • 来源
    《Learning and intelligent optimization》|2010年|p.111-124|共14页
  • 会议地点 Venice(IT);Venice(IT)
  • 作者单位

    TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud), bat 490 Univ. Paris-Sud 91405 Orsay, France;

    TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud), bat 490 Univ. Paris-Sud 91405 Orsay, France;

    TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud), bat 490 Univ. Paris-Sud 91405 Orsay, France;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机网络;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号