首页>
外国专利>
METHOD, AND CONTROLLER AND CONTROL PROGRAM THEREOF, FOR UPDATING POLICY PARAMETERS UNDER MARKOV DECISION PROCESS SYSTEM ENVIRONMENT
METHOD, AND CONTROLLER AND CONTROL PROGRAM THEREOF, FOR UPDATING POLICY PARAMETERS UNDER MARKOV DECISION PROCESS SYSTEM ENVIRONMENT
展开▼
机译:马尔可夫决策过程系统环境下更新政策参数的方法,控制方法和程序
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To implement a function for learning a decision-making model while suppressing an unnecessary increase in mixing time.SOLUTION: A technique for updating a parameter (policy parameter) defining a policy under a Markov decision process system environment includes updating the policy parameter according to an update equation. The update equation includes a term for decreasing a weighted sum (weighted expected hitting time sum) over a first state (s) and a second state (s') of a statistic (expected hitting time function) on the number of steps (hitting time) required to make a first state transition from the first state (s) to the second state (s').
展开▼