首页> 外文期刊>IEEE computational intelligence magazine >Strength Adjustment and Assessment for MCTS-Based Programs [Research Frontier]
【24h】

Strength Adjustment and Assessment for MCTS-Based Programs [Research Frontier]

机译:基于MCT的计划的力量调整与评估[研究前沿]

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes an approach to strength adjustment and assessment for Monte-Carlo tree search based game-playing programs. We modify an existing softmax policy with a strength index to choose moves. The most important modification is a mechanism which filters low-quality moves by excluding those that have a lower simulation count than a pre-defined threshold ratio of the maximum simulation count. Through theoretical analysis, we show that the adjusted policy is guaranteed to choose moves exceeding a lower bound in strength by using a threshold ratio. Experimental results show that the strength index is highly correlated to the empirical strength. With an index value between ?2, we can cover a strength range of about 800 Elo ratings. The strength adjustment and assessment methods were also tested in real-world scenarios with human players, ranging from professionals (strongest) to kyu rank amateurs (weakest). For amateur levels, we tested our mechanism on two popular Go online platforms - Fox Weiqi and Tygem. The result shows that our method can adjust program strength to different ranks stably. In terms of strength assessment, we proposed a new dynamic strength adjustment method, then used it to evaluate human professionals, predicting reliably their playing strengths within 15 games. Lastly, we collected survey responses asking players about strength perception, entertainment, and general comments for different aspects of analysis. To our best knowledge, this result is state-ofthe- art in terms of the range of strengths in Elo rating while maintaining a controllable relationship between the strength and a strength index.
机译:本文提出了一种对基于Monte-Carlo树搜索的游戏节目的力量调整和评估方法。我们使用强度索引修改现有的SoftMax策略来选择移动。最重要的修改是通过排除具有比最大仿真计数的预定阈值比率较低的模拟计数的那些来滤除低质量移动的机制。通过理论分析,我们表明,通过使用阈值比,保证调整后的策略是选择超出强度界限的移动。实验结果表明,强度指数与经验强度高度相关。在索引值之间?2,我们可以涵盖大约800个ELO评级的强度范围。实际调整和评估方法也在与人类参与者的真实情景中进行测试,从专业人士(最强)到Kyu等级业余(最弱)。对于业余水平,我们在两种流行的在线平台上测试了我们的机制 - Fox Weiqi和Tygem。结果表明,我们的方法可以稳定地将程序强度调整到不同的级别。在力量评估方面,我们提出了一种新的动态强度调整方法,然后用它来评估人类专业人才,预测在15场比赛中可靠地播放优势。最后,我们收集了调查答复,要求参与者有关分析的不同方面的力量感知,娱乐和一般性评论。为了我们的最佳知识,在ELO评级的强度范围内,这一结果是最先进的,同时保持强度和强度指数之间的可控关系。

著录项

  • 来源
    《IEEE computational intelligence magazine》 |2020年第3期|60-73|共14页
  • 作者单位

    Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

    Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

    Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

    Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

    Natl Chiao Tung Univ Dept Comp Sci Hsinchu Taiwan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号