首页> 外文学位 >Optimal learning strategies for multi-attribute resource allocation problems
【24h】

Optimal learning strategies for multi-attribute resource allocation problems

机译:多属性资源分配问题的最优学习策略

获取原文
获取原文并翻译 | 示例

摘要

We address the problem of optimal learning in approximate dynamic programming algorithms for solving discrete, multi-attribute resource allocation problems. Traditional approaches divide the problem into time stages and compute the optimal decisions for each time period using value functions that capture the information regarding the future state of the system. The curses of dimensionality, typical of such problems, force us to use approximate dynamic programming where we step forward in time and compute approximations of value functions of only the states that are actually visited. The challenges involved include estimating the value of a multi-attribute resource where the attributes could be categorical in nature and handling an extremely large number of attribute vectors, which make it hard to form reliable value estimates with limited information.;In order to overcome the problem of large state spaces, we could represent the resources at different levels of aggregation, producing the classic tradeoff between statistical and structural error. We propose a method which estimates the value of a resource as a weighted combination of values at different levels of aggregation. We derive the optimal weights which minimize the expected error, taking into account that the values are correlated and propose a few heuristic weighting schemes that seem promising. We demonstrate experimentally that our aggregation techniques produce significant improvements in solution quality, first for a single-asset allocation problem which can be solved to optimality and later in a dynamic fleet management problem with multiple asset classes.;The observations of value function estimates in approximate dynamic programming typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence and can vary widely among the value function parameters for the same dynamic program. Practical computational work tends to use formulas with parameters that have to be tuned for specific applications. We derive an optimal stepsize formula that minimizes estimation error. Experimental work in several different problem settings shows that an approximation of this rule provides faster convergence, without requiring any tuning of parameters, than other popular formulas.
机译:我们在解决离散,多属性资源分配问题的近似动态规划算法中解决了最佳学习的问题。传统方法将问题分为多个时间段,并使用值函数来计算每个时间段的最佳决策,这些值函数捕获有关系统未来状态的信息。此类问题的典型维数诅咒迫使我们使用近似动态编程,在此过程中,我们会逐步前进并仅计算实际访问的状态的值函数的近似值。所涉及的挑战包括:估计属性本质上可以归类的多属性资源的价值,以及处理大量的属性向量,这使得很难用有限的信息来形成可靠的价值估计。关于大状态空间的问题,我们可以表示不同聚合级别的资源,从而在统计误差和结构误差之间产生经典的折衷。我们提出了一种方法,可以将资源的价值估算为不同聚合级别的价值的加权组合。考虑到值之间的相关性,我们得出了最小化预期误差的最佳权重,并提出了一些启发式加权方案,这些方案似乎很有希望。我们通过实验证明了我们的聚合技术极大地提高了解决方案质量,首先解决了可以解决最优问题的单资产分配问题,然后解决了具有多个资产类别的动态车队管理问题。动态编程通常来自最初可能是高度瞬态的数据系列。瞬变的程度会影响步长参数的选择,这些参数会产生最快的收敛性,并且在同一动态程序的值函数参数之间可能会有很大差异。实际的计算工作倾向于使用具有必须针对特定应用调整的参数的公式。我们推导出了一个最佳的步长公式,该公式使估计误差最小。在几种不同问题设置中的实验工作表明,与其他流行的公式相比,该规则的近似值提供了更快的收敛速度,而无需任何参数调整。

著录项

  • 作者

    George, Abraham.;

  • 作者单位

    Princeton University.;

  • 授予单位 Princeton University.;
  • 学科 Operations research.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 194 p.
  • 总页数 194
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号