Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

机译：非线性两人零和游戏的同步策略迭代在线求解

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present an online gaming algorithm based on policy iteration to solve the continuous-time (CT) two-player zero-sum game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the game design HJI equation. This method finds in real-time suitable approximations of the optimal value, and the saddle point control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of critic, control actor, and disturbance neural networks. We call this online gaming algorithm `synchronous' zero-sum game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.

机译：在本文中，我们提出了一种基于策略迭代的在线博弈算法，用于解决具有已知水平的非线性系统的连续时间（CT）两人零和博弈且具有无限期成本的问题。也就是说，该算法可实时在线学习游戏设计HJI方程的解决方案。该方法可实时找到最佳值的合适近似值，鞍点控制策略和扰动策略，同时还能确保闭环稳定性。自适应算法被实现为演员/批评者结构，其中涉及评论家，控制演员和干扰神经网络的同时连续时间自适应。我们将此在线游戏算法称为“同步”零和游戏策略迭代。示出了激励条件的持久性，以确保评论者收敛到实际最优值函数。针对批评者，演员和干扰网络，给出了新颖的调谐算法。证明了最佳鞍点解的收敛性，并且还保证了系统的稳定性。仿真实例表明了该算法的有效性。

著录项

来源
《49th IEEE Conference on Decision and Control》|2010年|p.3040-3047|共8页
会议地点
作者
Vamvoudakis K.G.; Lewis F.L.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动控制理论;
关键词
Approximate Dynamic Programming; H-infinity; Hamilton-Jacobi-Isaacs equation; Nash-equilibrium; Persistence of Excitation; Policy Iteration; Synchronous Zero-Sum Game Policy Iteration;

机译：近似动态规划; H-无穷大; Hamilton-Jacobi-Isaacs方程; Nash平衡;激励持续性;策略迭代;同步零和博弈策略迭代;

相似文献

外文文献
中文文献
专利

1. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration [J] . Vamvoudakis K.G., Lewis F.L. International Journal of Robust and Nonlinear Control . 2012,第13期

机译：非线性两人零和游戏的同步策略迭代在线求解
2. Online Solution of Two-Player Zero-Sum Games for Continuous-Time Nonlinear Systems With Completely Unknown Dynamics [J] . Yue Fu, Tianyou Chai Neural Networks and Learning Systems, IEEE Transactions on . 2016,第12期

机译：具有未知动态的连续时间非线性系统的两人零和游戏在线解决方案
3. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming [J] . Song Ruizhuo, Zhu Liao Neurocomputing . 2019,第MAYa7期

机译：基于自适应动态规划的离散非线性系统两人零和游戏的稳定值迭代
4. Online Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [C] . Kyriakos G. Vamvoudakis, F.L. Lewis Institute of Electrical and Electronics Engineers Conference on Decision and Control . 2010

机译：使用同步策略迭代的非线性双人零和游戏的在线解决方案
5. Deception in two-player zero-sum stochastic games: Theory and application to warfare games. [D] . Singh, Rajdeep. 2006

机译：两人零和随机游戏中的欺骗：理论和在战争游戏中的应用。
6. Modified Asano-Ohya-Khrennikov quantum-like model fordecision-making process in a two-player game with nonlinear self- and cross-interactionterms of brain’s amygdala and prefrontal-cortex [O] . Luluk Muthoharoh, Hendradi Hardhienata, Husin Alatas 2020

机译：改进的asano-ohya-khrennikov量子般的模型双人游戏中的决策过程具有非线性自我和交叉交互大脑杏仁杆菌和前额外-Coltex的条款
7. Online Gaming: Real Time Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [O] . Kyriakos G., Frank L. 2011

机译：在线游戏：使用同步策略迭代的非线性双人零和游戏的实时解决方案

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

摘要

著录项

相似文献

相关主题

期刊订阅