Adaptive Optimization of Markov Reward Processes

机译：马尔可夫奖励过程的自适应优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of optimizing the average reward of Markov chains controlled by two sets of parameters 1) a set of tunable parameters and 2) a set of fixed but unknown parameters. We study the convergence characteristics of recursive estimation procedures based on the observation of regenerative cycles. We also provide sufficient conditions for the convergence to local optima of existing simulation-based optimization procedures under parameter certainty, in order to achieve simultaneous optimal selection of the tunable parameters and identification of the unknown parameters. To illustrate our approach, we discuss an algorithm which exploits the gradient of the likelihood of an observed regenerative cycle and its application to a regenerative simulation-based algorithm introduced in [1]. Our results are illustrated numerically in a problem of optimal pricing of services in a multi-class loss network.

机译：我们考虑优化由两套参数控制的马尔可夫链平均奖励的问题：1）一组可调参数，以及2）一组固定但未知的参数。我们基于对再生周期的观察来研究递归估计程序的收敛特性。我们还为参数确定性下现有的基于仿真的优化过程的局部最优收敛提供了充分的条件，以实现可调参数的同时最优选择和未知参数的识别。为了说明我们的方法，我们讨论了一种算法，该算法利用了观察到的再生周期的可能性的梯度，并将其应用于[1]中介绍的基于再生仿真的算法。我们的结果在一个多类损失网络中的服务最优定价问题中得到了数字说明。

著录项

来源
《IEE Colloquium on Why aren't we Training Measurement Engineers?, 1992》|1992年|p.8034-8041|共8页
会议地点
作者
Campos-Nanez, E.; Patek, S.D.;
展开▼
作者单位

Department of Engineering Management and Systems Engineering The George Washington University 1776 G Street Washington DC 20052 USA ecamposn@gwu.edu;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Adaptive aggregation for reinforcement learning in average reward Markov decision processes [J] . Ronald Ortner Annals of Operations Research . 2013,第1期

机译：自适应聚合用于平均奖励马尔可夫决策过程中的强化学习
2. Adaptive aggregation for reinforcement learning in average reward Markov decision processes [J] . Ronald Ortner Annals of Operations Research . 2013,第sepa期

机译：自适应聚合用于平均奖励马尔可夫决策过程中的强化学习
3. Approximation and adaptive control of Markov processes: Average reward criterion [J] . Hernández-Lerma Onésimo Kybernetika . 1987,第4期

机译：马尔可夫过程的逼近和自适应控制：平均奖励准则
4. Adaptive Optimization of Markov Reward Processes [C] . Enrique Campos-Nanez, Stephen D. Patek, The Institute of Electrical and Electronics EngineersInc. IEEE Conference on Decision and Control . 2005

机译：马尔可夫奖励过程的自适应优化
5. Adaptive online optimization of Markov reward processes with application to pricing of multiclass loss network services. [D] . Campos-Nanez, Enrique. 2003

机译：马尔可夫奖励过程的自适应在线优化及其在多类亏损网络服务定价中的应用。
6. Learning to maximize reward rate: a model based on semi-Markov decision processes [O] . Arash Khodadadi, Pegah Fakhari, Jerome R. Busemeyer 2014

机译：学习最大化奖励率：基于半马尔可夫决策过程的模型
7. Simulation-Based Optimization of Markov Reward Processes [O] . 2008

机译：基于仿真的马尔可夫奖励过程优化

Adaptive Optimization of Markov Reward Processes

摘要

著录项

相似文献

相关主题

期刊订阅