Online Multiagent Learning against Memory Bounded Adversaries

机译：对记忆有界对手的在线多元学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary. LoE-AIM makes no prior assumptions about the opponent and is tailored to optimally exploit any adversary which induces a Markov decision process in the state space of joint histories. LoE-AIM either explores and gathers new information about the opponent or converges to the best response to the partially learned opponent strategy in repeated play. We further extend LoE-AIM to account for online repeated interactions against the same adversary with plays against other adversaries interleaved in between. LoE-AIM-repeated stores learned knowledge about an adversary, identifies the adversary in case of repeated interaction, and reuses the stored knowledge about the behavior of the adversary to enhance learning in the current epoch of play. LoE-AIM and LoE-AIM-repeated are fully implemented, with results demonstrating their superiority over other existing MAL algorithms.

机译：传统议程中的多读学习（MAL）是为了开发学习者，以保证自行发挥均衡的汇率，或者使用固定的已知目标策略之一来汇集对对手的最佳反应。本文介绍了一种名为“对手的敌人所谓的马尔可夫决策过程（LOE-AIM）的算法，其针对任何可以被视为记忆有界对手的任何学习对手的最佳性。 LOE-AIM没有关于对手的现有假设，并定制以最佳地利用任何对手，这些反对者在联合历史的状态空间中诱导马尔可夫决策过程。 LOE-AIM要么探索并收集对对手或收敛的新信息，或者收敛到反复发挥中部分学习的对手战略的最佳反应。我们进一步延长了LOE-APIP，以解释在线反复互动，与与其之间交错的其他对手的竞争对手的相同对手。 Loe-Aim-Repected商店学到了关于对手的知识，在反复互动的情况下识别对手，并重新储存关于对手的行为的存储知识，以加强当前的戏剧中的学习。 LOE-AIM和LOE-AIM-REPEATED全面实施，结果表明其在其他现有的MAL算法上的优越性。

著录项

来源
《European Conference on Machine Learning and Knowledge Discovery in Databases》|2008年||共16页
会议地点
作者
Doran Chakraborty; Peter Stone;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Multiagent learning in the presence of memory-bounded agents [J] . Doran Chakraborty, Peter Stone Autonomous agents and multi-agent systems . 2014,第2期

机译：存在内存受限代理的多代理学习
2. Bounded memory Dolev-Yao adversaries in collaborative systems [J] . Max Kanovich, Tajana Ban Kirigin, Vivek Nigam, Information and computation . 2014,第nova期

机译：协作系统中的受限内存Dolev-Yao对手
3. Quantum Algorithms for Learning Symmetric Juntas via the Adversary Bound [J] . Belovs Aleksandrs Computational complexity . 2015,第2期

机译：通过对手约束学习对称Juntas的量子算法
4. Online Multiagent Learning against Memory Bounded Adversaries [C] . Doran Chakraborty, Peter Stone European Conference on Machine Learning and Knowledge Discovery in Databases;ECML PKDD 2008 . 2008

机译：针对记忆障碍对手的在线多主体学习
5. Rule -based evolutionary online learning systems: Learning bounds, classification, and prediction [D] . Butz, Martin Volker 2004

机译：基于规则的进化型在线学习系统：学习范围，分类和预测
6. Online Learning and Memory of Neural Trajectory Replays for Prefrontal Persistent and Dynamic Representations in the Irregular Asynchronous State [O] . Matthieu X. B. Sarazin, Julie Victor, David Medernach, 2021

机译：在不规则异步状态下的前额外持久和动态表示的神经轨迹的在线学习和记忆
7. Online Multiagent Learning against Memory Bounded Adversaries [O] . Doran Chakraborty, Peter Stone 2009

机译：针对记忆障碍对手的在线多主体学习

Online Multiagent Learning against Memory Bounded Adversaries

摘要

著录项

相似文献

相关主题

期刊订阅