The Significance of Temporal-Difference Learning in Self-Play Training TD-rummy versus EVO-rummy

机译：时差学习在自学训练中TD-rummy与EVO-rummy的意义

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Co-evolution of the value function is also claimed to yield good results. This paper reports on a direct comparison between an agent trained to play gin rummy using temporal difference learning, and the same agent trained with co-evolution. Co-evolution produced superior results.

机译：强化学习已用于培训游戏代理商。复杂游戏的价值函数必须用连续函数来近似，因为状态数变得太大而无法枚举。具有自我演奏的时差学习是成功用于推导值逼近函数的一种方法。还声称值函数的共同演化会产生良好的结果。本文报告了使用时差学习训练过的玩杜松子酒的特工与通过共同进化训练过的同一个特工之间的直接比较。协同进化产生了卓越的结果。

著录项

来源
《20th International Conference on Machine Learning》|2003年|P.369-375|共7页
会议地点
作者
Clifford Kotnik; Jugal Kalita;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机的应用;
关键词

相似文献

外文文献
中文文献
专利

1. Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go [J] . Runarsson T.P., Lucas S.M. IEEE transactions on evolutionary computation . 2005,第6期

机译：协同进化与自演时差学习在小板围棋中获得位置评估
2. On Average Versus Discounted Reward Temporal-Difference Learning [J] . John N. Tsitsiklis, Benjamin Van Roy Machine Learning . 2002,第2a3期

机译：平均对折奖励时间差异学习
3. Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework [J] . Qi Wang, Yongsheng Hao, Jie Cao Engineering Applications of Artificial Intelligence . 2021,第Octa期

机译：学习通过基于蒙特卡罗树搜索的自助框架来遍历图表
4. The Significance of Temporal-Difference Learning in Self-Play Training TD-rummy versus EVO-rummy [C] . Clifford Kotnik, Jugal Kalita International Conference on Machine Learning . 2003

机译：时间差异学习在自助训练中的意义TD-RUMMY与EVO-RUMMY
5. The Impact of Learning Style on Healthcare Providers' Preference for Voice Advisory Manikins Versus Live Instructors in Basic Life Support Training. [D] . DiGiovanni, Lisa Marie. 2013

机译：学习方式对基础生命支持培训中医疗保健提供者偏爱语音咨询人体模型与现场讲师的影响。
6. Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks [O] . Chentao Wen, Yukiko Ogura, Toshiya Matsushima 2016

机译：纹状体和背盖神经元代码关键信号的家禽的状态值的时差学习。
7. Co-evolution versus Self-play Temporal Difference Learning for Acquiring Position Evaluation in Small-Board Go [O] . Thomas Philip, Runarsson Member, Simon M. Lucas Member 2015

机译：小板块围棋中获取位置评估的协同进化与自我时间差异学习

The Significance of Temporal-Difference Learning in Self-Play Training TD-rummy versus EVO-rummy

摘要

著录项

相似文献

相关主题

期刊订阅