...
首页> 外文期刊>Journal of supercomputing >Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent's learning process in multiagent environments
【24h】

Reinforcement learning technique using agent state occurrence frequency with analysis of knowledge sharing on the agent's learning process in multiagent environments

机译:使用代理状态发生频率并分析多代理环境中代理学习过程中的知识共享的强化学习技术

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Reinforcement learning techniques like the Q-Learning one as well as the Multiple-Lookahead-Levels one that we introduced in our prior work require the agent to complete an initial exploratory path followed by as many hypothetical and physical paths as necessary to find the optimal path to the goal. This paper introduces a reinforcement learning technique that uses a distance measure to the goal as a primary gauge for an autonomous agent's action selection. In this paper, we take advantage of the first random walk to acquire initial information about the goal. Once the agent's goal is reached, the agent's first perceived internal model of the environment is updated to reflect and include said goal. This is done by the agent tracing back its steps to its origin starting point. We show in this paper, no exploratory or hypothetical paths are required after the goal is initially reached or detected, and the agent requires a maximum of two physical paths to find the optimal path to the goal. The agent's state occurrence frequency is introduced as well and used to support the proposed Distance-Only technique. A computation speed performance analysis is carried out, and the Distance-and-Frequency technique is shown to require less computation time than the Q-Learning one. Furthermore, we present and demonstrate how multiple agents using the Distance-and-Frequency technique can share knowledge of the environment and study the effect of that knowledge sharing on the agents' learning process.
机译:我们在之前的工作中引入的强化学习技术(例如Q学习和多重学习水平)要求代理人完成初始探索性​​路径,然后根据需要尝试尽可能多的假设和物理路径以找到最佳路径达到目标。本文介绍了一种强化学习技术,该技术使用距离目标的距离度量作为自主代理人动作选择的主要度量。在本文中,我们利用第一次随机游走来获取有关目标的初始信息。一旦达到了代理商的目标,便会更新代理商对环境的第一个感知内部模型,以反映并包含所述目标。这是通过代理将其步骤追溯到其起点而完成的。我们在本文中显示,在最初达到或检测到目标之后,不需要探索性或假设性的路径,并且代理最多需要两条物理路径来找到到达目标的最佳路径。还介绍了代理的状态出现频率,并用于支持所提出的仅距离技术。进行了计算速度性能分析,并且显示了距离和频率技术比Q-Learning技术需要更少的计算时间。此外,我们介绍并演示了使用距离和频率技术的多个代理如何共享环境知识,并研究这种知识共享对代理学习过程的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号