首页> 外文会议>International Conference on Case-Based Reasoning >Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse
【24h】

Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse

机译:模仿贫困的敌人:从随机政策观察,检索和重用学习

获取原文

摘要

In this paper we study the topic of CBR systems learning from observations in which those observations can be represented as stochastic policies. We describe a general framework which encompasses three steps: (1) it observes agents performing actions, elicits stochastic policies representing the agents' strategies and retains these policies as cases. (2) The agent analyzes the environment and retrieves a suitable stochastic policy. (3) The agent then executes the retrieved stochastic policy, which results in the agent mimicking the previously observed agent. We implement our framework in a system called JuKeCB that observes and mimics players playing games. We present the results of three sets of experiments designed to evaluate our framework. The first experiment demonstrates that JuKeCB performs well when trained against a variety of fixed strategy opponents. The second experiment demonstrates that JuKeCB can also, after training, win against an opponent with a dynamic strategy. The final experiment demonstrates that JuKeCB can win against "new" opponents (i.e. opponents against which JuKeCB is untrained).
机译:在本文中,我们研究了CBR系统学习的观察的主题,其中这些观察可以代表随机政策。我们描述了一个包含三个步骤的一般框架:(1)它观察了执行行动的代理,引发了代理人战略的随机政策,并保留了这些政策作为案例。 (2)代理分析环境并检索合适的随机政策。 (3)代理然后执行检索到的随机策略,这导致模拟先前观察到的代理的代理。我们在一个名为jukecb的系统中实施我们的框架,观察和模仿玩游戏的玩家。我们展示了三组实验的结果,旨在评估我们的框架。第一个实验表明,当培训针对各种固定策略对手时,Jukecb表现良好。第二个实验表明,在训练后,jukecb也可以用动态战略赢得对手。最后的实验表明,尤科克可以赢得“新”对手(即,jukecb未经训练的对手)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号