首页> 外国专利> System and Method for Policy Optimization using Quasi-Newton Trust Region Method

System and Method for Policy Optimization using Quasi-Newton Trust Region Method

机译：使用Quasi-Newton信任区域方法的策略优化系统和方法

页面导航

摘要
著录项
相似文献

摘要

A computer-implemented learning method for optimizing a control policy controlling a system is provided. The method includes receiving states of the system being operated for a specific task, initializing the control policy as a function approximator including neural networks, collecting state transition and reward data using a current control policy, estimating an advantage function and a state visitation frequency based on the current control policy, updating the current control policy using the second-order approximation of the objective function, a second-order approximation of the KL-divergence constraint on the permissible change in the policy using a quasi-newton trust region policy optimization, and determining an optimal control policy, for controlling the system, based on the average reward accumulated using the updated current control policy.

机译：提供了一种用于优化控制系统的控制策略的计算机实现的学习方法。该方法包括接收用于特定任务的系统的状态，将控制策略初始化为包括神经网络的函数近似器，使用当前控制策略收集状态转换和奖励数据，估计优势函数和基于状态的探索频率当前控制策略，使用目标函数的二阶近似更新当前控制策略，使用Quasi-Newton信任区域策略优化的策略允许的允许变化的kl发散约束的二阶近似值基于使用更新的当前控制策略累积的平均奖励来确定对系统进行控制的最佳控制策略。

著录项

公开/公告号US2021103255A1

专利类型
公开/公告日2021-04-08

原文格式PDF
申请/专利权人 MITSUBISHI ELECTRIC RESEARCH LABORATORIES INC.;
展开▼

申请/专利号US201916592977
发明设计人 DEVESH JHA;ARVIND RAGHUNATHAN;DIEGO ROMERES;
展开▼

申请日2019-10-04
分类号G05B13/02;G05B13/04;
国家 US
入库时间 2022-08-24 17:24:02

相似文献

专利
外文文献
中文文献