【24h】

Adaptive Optimization of Markov Reward Processes

机译:马尔可夫奖励过程的自适应优化

获取原文

摘要

We consider the problem of optimizing the average reward of Markov chains controlled by two sets of parameters 1) a set of tunable parameters and 2) a set of fixed but unknown parameters. We study the convergence characteristics of recursive estimation procedures based on the observation of regenerative cycles. We also provide sufficient conditions for the convergence to local optima of existing simulation-based optimization procedures under parameter certainty, in order to achieve simultaneous optimal selection of the tunable parameters and identification of the unknown parameters. To illustrate our approach, we discuss an algorithm which exploits the gradient of the likelihood of an observed regenerative cycle and its application to a regenerative simulation-based algorithm introduced in [1]. Our results are illustrated numerically in a problem of optimal pricing of services in a multi-class loss network.
机译:我们考虑优化由两套参数控制的马尔可夫链平均奖励的问题:1)一组可调参数,以及2)一组固定但未知的参数。我们基于对再生周期的观察来研究递归估计程序的收敛特性。我们还为参数确定性下现有的基于仿真的优化过程的局部最优收敛提供了充分的条件,以实现可调参数的同时最优选择和未知参数的识别。为了说明我们的方法,我们讨论了一种算法,该算法利用了观察到的再生周期的可能性的梯度,并将其应用于[1]中介绍的基于再生仿真的算法。我们的结果在一个多类损失网络中的服务最优定价问题中得到了数字说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号