...
首页> 外文期刊>Operations Research: The Journal of the Operations Research Society of America >Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models
【24h】

Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models

机译:经验马尔可夫决策过程的性能保证,应用于多周期库存模型

获取原文
获取原文并翻译 | 示例
           

摘要

We consider Markov decision processes with unknown transition probabilities and unknown single-period expected cost functions, and we study a method for estimating these quantities from historical or simulated data. The method requires knowledge of the system equations that govern state transitions as well as the single-period cost functions (but not the single-period expected cost functions). The estimation procedure is based upon taking expectations with respect to the empirical distribution functions of such data. Once the estimates are in place, the method computes a policy by solving the obtained "empirical" Markov decision process as if the estimates were correct. For MDPs that satisfy some conditions, we provide explicit, easily computed expressions for the probability that the procedure will produce a policy whose true expected cost is within any specified absolute distance of the actual optimal expected cost of the true Markov decision process. We also provide expressions for the number of historical or simulated data values that is sufficient for the procedure to produce a policy whose true expected cost is, with a prescribed probability, within a prescribed absolute distance of the actual optimal expected cost of the true Markov decision process. We apply our results to multiperiod inventory models. In addition, we provide a specialized analysis of such inventory models that also yields relative, rather than absolute, accuracy guarantees. We make comparisons with related results that have recently appeared, and we provide numerical examples.
机译:我们考虑具有未知转移概率和未知单周期预期成本函数的马尔可夫决策过程,并且我们研究了一种从历史或模拟数据估计这些数量的方法。该方法需要掌握用于控制状态转换的系统方程式以及单周期成本函数(而不是单周期预期成本函数)的知识。估计程序基于对此类数据的经验分布函数的期望。一旦估算到位,该方法通过解决获得的“经验”马尔可夫决策过程来计算策略,就好像估算是正确的一样。对于满足某些条件的MDP,我们提供明确,易于计算的表达式,表示该过程将产生其真实预期成本在真实Markov决策过程的实际最佳预期成本的任何指定绝对距离之内的策略的可能性。我们还提供了历史或模拟数据值的数量的表达式,该表达式足以使程序生成其真实预期成本具有指定概率且在真实马尔可夫决策的实际最佳预期成本的规定绝对距离之内的策略处理。我们将结果应用于多期间库存模型。此外,我们对此类库存模型进行了专门分析,从而也提供了相对而非绝对的准确性保证。我们将与最近出现的相关结果进行比较,并提供数值示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号