Combining Dynamic Reward Shaping and Action Shaping for Coordinating Multi-agent Learning

机译：结合动态奖励塑形和动作塑形来协调多主体学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coordinating multi-agent reinforcement learning provides a promising approach to scaling learning in large cooperative multi-agent systems. It allows agents to learn local decision policies based on their local observations and rewards, and, meanwhile, coordinates agents' learning processes to ensure the global learning performance. One key question is that how coordination mechanisms impact learning algorithms so that agents' learning processes are guided and coordinated. This paper presents a new shaping approach that effectively integrates coordination mechanisms into local learning processes. This shaping approach uses two-level agent organization structures and combines reward shaping and action shaping. The higher-level agents dynamically and periodically produce the shaping heuristic knowledge based on the learning status of the lower-level agents. The lower-level agents then uses this knowledge to coordinate their local learning processes with other agents. Experimental results show our approach effectively speeds up the convergence of multi-agent learning in large systems.

机译：协调多主体强化学习为扩展大型协作多主体系统中的学习提供了一种有前途的方法。它使代理可以根据他们的本地观察和奖励来学习本地决策策略，同时可以协调代理的学习过程以确保全局学习效果。一个关键问题是，协调机制如何影响学习算法，从而指导和协调座席的学习过程。本文提出了一种新的塑造方法，可以有效地将协调机制整合到本地学习过程中。这种整形方法使用两级代理人组织结构，并将奖励整形和动作整形相结合。较高级别的代理根据较低级别的代理的学习状态动态并定期生成成形启发式知识。然后，较低级别的座席使用此知识来与其他座席协调他们的本地学习过程。实验结果表明，我们的方法有效地加快了大型系统中多智能体学习的收敛速度。

著录项

来源
《IEEE/WIC/ACM International Conference on Intelligent Agent Technology》|2013年|321-328|共8页
会议地点
作者
Zhu Xiangbin; Zhang Chongjie; Lesser Victor;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Action Shaping; Multi-Agent Learning; Organization Control; Reward Shaping; Supervision;

机译：行动塑形;多智能体学习;组织控制;奖励塑形;监督;

相似文献

外文文献
中文文献
专利

1. Plan-based reward shaping for multi-agent reinforcement learning [J] . Devlin Sam, Kudenko Daniel The Knowledge Engineering Review . 2016,第1期

机译：基于计划的奖励成型，用于多主体强化学习
2. Context-sensitive reward shaping for sparse interaction multi-agent systems [J] . De Hauwere Yann-Michael, Devlin Sam, Kudenko Daniel, The Knowledge Engineering Review . 2016,第1期

机译：稀疏交互多主体系统的上下文相关奖励整形
3. Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives [J] . Minija Tamosiunaite, Bojan Nemec, Ales Ude, Robotics and Autonomous Systems . 2011,第11期

机译：学习如何结合目标和形状学习的机械手为动态运动图元浇注
4. Combining Dynamic Reward Shaping and Action Shaping for Coordinating Multi-Agent Learning [C] . Xiangbin Zhu, Chongjie Zhang, Victor Lesser IEEE/WIC/ACM International Conference on Intelligence Agent Technology . 2013

机译：组合动态奖励塑造和行动塑造来协调多智能经纪人学习
5. Reward Prediction Errors Shape Memory during Reinforcement Learning [D] . Rouhani, Nina. 2020

机译：奖励预测错误在加固学习期间形状内存
6. Microfilament-coordinated adhesion dynamics drives single cell migration and shapes whole tissues [O] . Rocio Aguilar-Cuenca, Clara Llorente-Gonzalez, Carlos Vicente, -1

机译：微丝协调的粘附动力学驱动单细胞迁移并塑造整个组织
7. Theoretical Considerations of Potential-Based Reward Shaping for Multi-Agent Systems [O] . Devlin Sam, Kudenko Daniel 2011

机译：多智能体系统潜在回报整形的理论思考
8. Aerodynamic characteristics of missile configurations with wings of low aspect ratio for various combinations of forebodies, afterbodies, and nose shapes for combined angles of attack and sideslip at a Mach number of 2.01 [R] . Robinson, Ross B 1957

机译：具有低纵横比翼的导弹配置的空气动力学特性，用于各种组合的前体，后体和鼻子形状，用于组合攻角和侧滑，马赫数为2.01

Combining Dynamic Reward Shaping and Action Shaping for Coordinating Multi-agent Learning

摘要

著录项

相似文献

相关主题

期刊订阅