首页> 外文会议>International Joint Conferences on Web Intelligent and Intelligent Agent Technologies >Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment
【24h】

Globally Optimal Multi-agent Reinforcement Learning Parameters in Distributed Task Assignment

机译:分布式任务分配中全球最佳多功能多功能多功能多功能多功能多功能多功能

获取原文

摘要

Large-scale simulation studies are necessary to study the learning behaviour of individual agents and the overall system dynamics. One reason is that planning algorithms to find optimal solutions to fully observable general decentralised Markov decision problems do not admit to polynomial-time worst-case complexity bounds. Additionally, agent interaction often implies a non-stationary environment which does not lend itself to asymptotically greedy policies. Therefore, policies with a constant level of exploration are required to be able to adapt continuously. This paper casts the application domain of distributed task assignment into the formalisms of queueing theory, complex networks and decentralised Markov decision problems to analyse the impact of the momentum of a standard back-propagation neural network function approximator and the discount factor of $SARSA(0)$ reinforcement learning and the $epsilon$ parameter of the $epsilon$-greedy policy. For this purpose large queueing networks of one thousand interacting agents are evolved. A Kriging metamodel is fitted and in combination with simulated annealing optimal operating conditions with respect to the total average response time are found. The insights gained from this study are significant in that they provide guidance in deploying large-scale distributed task assignment systems modelled as multi-agent queueing networks.
机译:需要大规模仿真研究来研究个体代理商的学习行为和整体系统动态。一个原因是,规划算法找到完全可观察到的总体分散的马尔可夫决策问题的最佳解决方案不承认多项式最坏情况的复杂性界限。此外,代理交互通常意味着一个非静止的环境,不会为渐近贪婪的政策提供借调。因此,需要持续勘探水平的政策能够连续调整。本文蒙上分布式任务分配的应用领域为排队理论,复杂的网络和分散的马尔可夫决策问题,分析标准的BP神经网络函数逼近的势头,$ SARSA的贴现因子的影响的形式主义(0 )$强化学习和$ epsilon $ -greedy政策的$ epsilon $参数。为此目的,一千个交互代理的大型排队网络正在进行中。克里格化元模型配合并结合模拟退火的仿真退火相对于总平均响应时间。本研究中获得的见解是显着的,因为它们提供了部署为多功率排队网络建模的大规模分布式任务分配系统的指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号