首页> 外文会议>IASTED International Conference on Artificial Intelligence and Soft Computing >COOPERATIVE REINFORCEMENT LEARNING USING AN EXPERT-MEASURING WEIGHTED STRATEGY WITH WOLF
【24h】

COOPERATIVE REINFORCEMENT LEARNING USING AN EXPERT-MEASURING WEIGHTED STRATEGY WITH WOLF

机译:使用狼的专家测量加权策略的合作加固学习

获取原文

摘要

Gradient descent learning algorithms have proven effective in solving mixed strategy games. The policy hill climbing (PHC) variants of WoLF (Win or Learn Fast) and PDWoLF (Policy Dynamics based WoLF) have both shown rapid convergence to equilibrium solutions by increasing the accuracy of their gradient parameters over standard Q-learning. Likewise, cooperative learning techniques using weighted strategy sharing (WSS) and expertness measurements improve agent performance when multiple agents are solving a common goal. By combining these cooperative techniques with fast gradient descent learning, an agent's performance converges to a solution at an even faster rate. This statement is verified in a stochastic grid world environment using a limited visibility hunter-prey model with random and intelligent prey. Among five different expertness measurements, cooperative learning using each PHC algorithm converges faster than independent learning when agents strictly learn from better performing agents.
机译:梯度下降学习算法已经证明有效解决混合策略游戏。狼(PHC)的狼(胜利或学习)和PDWOLF(基于政策动态的狼)的政策山攀爬(PHC)变体通过在标准Q-Learning上提高其梯度参数的准确性来表现出快速收敛到均衡解决方案。同样地,使用加权策略共享(WSS)和专业测量的协同学习技术改善了多个代理正在解决共同目标时的代理性能。通过将这些协作技术与快速梯度下降学习组合,代理的性能以偶然的速率收敛到解决方案。使用有关随机和智能猎物的有限的可见性猎人 - 猎物模型,在随机电网世界环境中验证了该声明。在五种不同的专业测量中,使用每个PHC算法的协作学习比独立学习收敛快,当代理人严格学习从更好的表演代理时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号