Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse

机译：模仿贫困的敌人：从随机政策观察，检索和重用学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we study the topic of CBR systems learning from observations in which those observations can be represented as stochastic policies. We describe a general framework which encompasses three steps: (1) it observes agents performing actions, elicits stochastic policies representing the agents' strategies and retains these policies as cases. (2) The agent analyzes the environment and retrieves a suitable stochastic policy. (3) The agent then executes the retrieved stochastic policy, which results in the agent mimicking the previously observed agent. We implement our framework in a system called JuKeCB that observes and mimics players playing games. We present the results of three sets of experiments designed to evaluate our framework. The first experiment demonstrates that JuKeCB performs well when trained against a variety of fixed strategy opponents. The second experiment demonstrates that JuKeCB can also, after training, win against an opponent with a dynamic strategy. The final experiment demonstrates that JuKeCB can win against "new" opponents (i.e. opponents against which JuKeCB is untrained).

机译：在本文中，我们研究了CBR系统学习的观察的主题，其中这些观察可以代表随机政策。我们描述了一个包含三个步骤的一般框架：（1）它观察了执行行动的代理，引发了代理人战略的随机政策，并保留了这些政策作为案例。（2）代理分析环境并检索合适的随机政策。（3）代理然后执行检索到的随机策略，这导致模拟先前观察到的代理的代理。我们在一个名为jukecb的系统中实施我们的框架，观察和模仿玩游戏的玩家。我们展示了三组实验的结果，旨在评估我们的框架。第一个实验表明，当培训针对各种固定策略对手时，Jukecb表现良好。第二个实验表明，在训练后，jukecb也可以用动态战略赢得对手。最后的实验表明，尤科克可以赢得“新”对手（即，jukecb未经训练的对手）。

著录项

来源
《International Conference on Case-Based Reasoning》|2010年||共15页
会议地点
作者
Kellen Gillespie; Justin Karneeb; Stephen Lee-Urban; Hector Munoz-Avila;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Learning from observation; Case capture and reuse; Policy;

机译：学习观察;案例捕获和重用;政策;

相似文献

外文文献
中文文献
专利

1. Learning to imitate stochastic time series in a compositional way by chaos. [J] . Namikawa J, Tani J Neural Networks: The Official Journal of the International Neural Network Society . 2010,第5期

机译：学习通过混沌来模仿随机时间序列。
2. Learning to imitate stochastic time series in a compositional way by chaos [J] . Jun NAMIKAWA, Jun TANI 電子情報通信学会技術研究報告. 非線形問題. Nonlinear Problems . 2009,第124期

机译：学习通过混沌来模仿随机时间序列
3. Learning to imitate stochastic time series in a compositional way by chaos [J] . Jun NAMIKAWA, Jun TANI 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2009,第125期

机译：学习通过混沌来模仿随机时间序列
4. Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse [C] . Kellen Gillespie, Justin Karneeb, Stephen Lee-Urban, Case-based reasoning research and development . 2010

机译：模仿难以捉摸的敌人：从随机策略观察，检索和重用中学习
5. The enemy of my enemy is my friend: Okinawan identity and military government policy in occupied Okinawa, April, 1945 [D] . Short, Courtney A. 2008

机译：敌人的敌人是我的朋友：冲绳身份和军事政府政策在占领冲绳，1945年4月
6. Retrieval Practice Facilitates Judgments of Learning Through Multiple Mechanisms: Simultaneous and Independent Contribution of Retrieval Confidence and Retrieval Fluency [O] . Xi Chen, Mengting Zhang, Xiaonan L. Liu 2004

机译：检索实践通过多种机制促进学习判断：检索信心和检索流畅性的同时和独立贡献
7. Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse [O] . Kellen Gillespie, Justin Karneeb, Stephen Lee-urban, 2010

机译：模仿不可思议的敌人：学习随机政策观察，检索和重用
8. Informal Technical Report for Software Technology for Adaptable Reliable Systems(STARS). Learning and Inquiry Based Reuse Adoption (LIBRA): A Field Guide to Reuse Adoption through Organizational Learning. Version 1.1 [R] . Bailin, S., Simos, M., Levine, L., 1996

机译：适应性可靠系统（sTaRs）软件技术的非正式技术报告。基于学习和探究的再利用（LIBRa）：通过组织学习重用收养的实地指南。版本1.1

Imitating Inscrutable Enemies: Learning from Stochastic Policy Observation, Retrieval and Reuse

摘要

著录项

相似文献

相关主题

期刊订阅