首页> 外文会议>49th IEEE Conference on Decision and Control >Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
【24h】

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

机译:非线性两人零和游戏的同步策略迭代在线求解

获取原文

摘要

In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous' zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
机译:在本文中,我们提出了一种基于策略迭代的在线博弈算法,用于解决具有已知水平的非线性系统的连续时间(CT)两人零和博弈且具有无限期成本的问题。也就是说,该算法可实时在线学习游戏设计HJI方程的解决方案。该方法可实时找到最佳值的合适近似值,鞍点控制策略和扰动策略,同时还能确保闭环稳定性。自适应算法被实现为演员/批评者结构,其中涉及评论家,控制演员和干扰神经网络的同时连续时间自适应。我们将此在线游戏算法称为“同步”零和游戏策略迭代。示出了激励条件的持久性,以确保评论者收敛到实际最优值函数。针对批评者,演员和干扰网络,给出了新颖的调谐算法。证明了最佳鞍点解的收敛性,并且还保证了系统的稳定性。仿真实例表明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号