首页> 外国专利> INVERSE REINFORCEMENT LEARNING METHOD, STORAGE MEDIUM FOR STORING INSTRUCTION TO EXECUTE PROCESSOR FOR PROCESS FOR INVERSE REINFORCEMENT LEARNING, SYSTEM FOR INVERSE REINFORCEMENT LEARNING, AND PREDICTION SYSTEM INCLUDING SYSTEM FOR INVERSE REINFORCEMENT LEARNING

INVERSE REINFORCEMENT LEARNING METHOD, STORAGE MEDIUM FOR STORING INSTRUCTION TO EXECUTE PROCESSOR FOR PROCESS FOR INVERSE REINFORCEMENT LEARNING, SYSTEM FOR INVERSE REINFORCEMENT LEARNING, AND PREDICTION SYSTEM INCLUDING SYSTEM FOR INVERSE REINFORCEMENT LEARNING

机译:逆向强化学习方法,用于向逆向强化学习过程执行指令存储指令的存储介质,逆向强化学习系统以及包含逆向强化学习系统的预测系统

摘要

A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)−V(x)=−1n{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x)/p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x)/p(y|x), and outputting the estimated q(x) and V(x).
机译:一种用于估计受试者的行为的成本和价值函数的逆向强化学习的方法,包括获取代表状态变量变化的数据,所述状态变量定义了受试者的行为。应用由等式给出的改进的Bellman方程。 (1)到获取的数据:q(x)+ gV(y)-V(x)=-1n {pi(y | x))/(p(y | x)}(1)其中q(x) V和(x)分别表示成本函数和值函数,在状态x处,g表示折扣因子,p(y | x)和pi(y | x)分别表示学习之前和之后的状态转移概率;在等式(1)中估计密度比pi(y | x)/ p(y | x);在等式(1)中根据最小二乘法估计q(x)和V(x)估计密度比pi(y | x)/ p(y | x),并输出估计q(x)和V(x)。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号