首页>
外国专利>
System and Method for Policy Optimization using Quasi-Newton Trust Region Method
System and Method for Policy Optimization using Quasi-Newton Trust Region Method
展开▼
机译:使用Quasi-Newton信任区域方法的策略优化系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer-implemented learning method for optimizing a control policy controlling a system is provided. The method includes receiving states of the system being operated for a specific task, initializing the control policy as a function approximator including neural networks, collecting state transition and reward data using a current control policy, estimating an advantage function and a state visitation frequency based on the current control policy, updating the current control policy using the second-order approximation of the objective function, a second-order approximation of the KL-divergence constraint on the permissible change in the policy using a quasi-newton trust region policy optimization, and determining an optimal control policy, for controlling the system, based on the average reward accumulated using the updated current control policy.
展开▼