首页> 外国专利> System and Method for Policy Optimization using Quasi-Newton Trust Region Method

System and Method for Policy Optimization using Quasi-Newton Trust Region Method

机译:使用Quasi-Newton信任区域方法的策略优化系统和方法

摘要

A computer-implemented learning method for optimizing a control policy controlling a system is provided. The method includes receiving states of the system being operated for a specific task, initializing the control policy as a function approximator including neural networks, collecting state transition and reward data using a current control policy, estimating an advantage function and a state visitation frequency based on the current control policy, updating the current control policy using the second-order approximation of the objective function, a second-order approximation of the KL-divergence constraint on the permissible change in the policy using a quasi-newton trust region policy optimization, and determining an optimal control policy, for controlling the system, based on the average reward accumulated using the updated current control policy.
机译:提供了一种用于优化控制系统的控制策略的计算机实现的学习方法。该方法包括接收用于特定任务的系统的状态,将控制策略初始化为包括神经网络的函数近似器,使用当前控制策略收集状态转换和奖励数据,估计优势函数和基于状态的探索频率当前控制策略,使用目标函数的二阶近似更新当前控制策略,使用Quasi-Newton信任区域策略优化的策略允许的允许变化的kl发散约束的二阶近似值基于使用更新的当前控制策略累积的平均奖励来确定对系统进行控制的最佳控制策略。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号