首页> 外文学位 >Optimal and Simulation-Based Approximate Dynamic Programming Approaches for the Control of Re-Entrant Line Manufacturing Models.
【24h】

Optimal and Simulation-Based Approximate Dynamic Programming Approaches for the Control of Re-Entrant Line Manufacturing Models.

机译:基于最优和基于仿真的近似动态规划方法,用于控制折返生产线模型。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation considers the application of simulation-based Approximate Dynamic Programming Approaches (ADP) for near-optimal control of Re-entrant Line Manufacturing (RLM) models. This study departs from the analysis of the optimal control problem under a discounted cost (DC) criterion in two simple RLM models with both job sequencing and job releasing control operations. Results on optimality conditions, structural properties of the optimal control policy, and sufficient conditions for optimality are provided. For the same models, four different simulation-based ADP approaches, namely, Q-Learning, Q-Learning with State Aggregation, SARSA(Lambda), and an Actor-Critic Architecture were utilized for control optimization. The ADP approaches studied include methods based on lookup tables and methods based on parametric approximations of the optimal cost function using temporal difference learning. Numerous simulation experiments were conducted to evaluate and compare the performance of the ADP methods employed against that of optimal solutions. Results indicate that the Actor-Critic approach consistently obtained a performance close to the optimal solutions while providing the best features for scalability in the state and action spaces which is essential for implementations of ADP in realistic RLM models. Upon these results, an extension of the Actor-Critic for larger RLM models is proposed under both a DC and average cost (AC) criterion. The formulation of the proposed approach is based on the representation of the RLM system as a model with an arbitrary number of single exponential servers and binary controls which can be seen as an abstraction of the simple RLM models previously studied. The proposed model is then amenable for the application of the uniformization procedure, which in turns allows for the derivation of optimality equations and conditions. These provide structural properties that also facilitate the definition and implementation of the control or actor in the proposed ADP algorithm. As an example, the so-called Intel Mini-Fab model was utilized in numerous simulation experiments on the optimization of job sequencing operations under an AC criterion. These experiments compared the performance of policies obtained with the proposed ADP against that of well known dispatching rules. Results from these experiments demonstrated the applicability of the proposed approach under different operational conditions, including different preventive maintenance schedules, random and deterministic processing times in the machines, and different load factors in the system. The results also demonstrate that, in general, the policies obtained with the proposed ADP approach provided good performance when compared to the dispatching rules considered. Moreover, the results show that under given operational conditions ADP-generated policies can even outperform the dispatching rules considered in the experiments. Finally, this dissertation also provides experimental results from the application of a simulation-based ADP approach for the optimization of preventive maintenance (PM) schedules in RLM models. The proposed approach utilizes an Actor-Critic architecture and the so-called post-decision state variable approach to define the actor in the ADP architecture. As an illustrative example, simulation experiments were conducted with the Intel Mini-Fab model. Results from these experiments demonstrated that ADP-generated PM policies were able to significatively reduce both the average work-in-process and the average cycle-time when compared to selected fixed PM policies.
机译:本文考虑了基于仿真的近似动态规划方法(ADP)在折返生产线(RLM)模型的近似最优控制中的应用。这项研究不同于在两个简单的RLM模型中以折扣成本(DC)准则对最优控制问题进行分析,该模型具有作业排序和作业释放控制操作。提供了关于最优性条件,最优控制策略的结构特性以及最优性的充分条件的结果。对于相同的模型,使用了四种不同的基于仿真的ADP方法,即Q学习,带状态集合的Q学习,SARSA(Lambda)和Actor-Critic体系结构进行控制优化。研究的ADP方法包括基于查找表的方法和基于使用时差学习的最佳成本函数的参数逼近的方法。进行了许多模拟实验,以评估和比较所采用的ADP方法与最佳解决方案的性能。结果表明,Actor-Critic方法始终获得接近最佳解决方案的性能,同时为状态和动作空间中的可伸缩性提供了最佳功能,这对于在实际RLM模型中实现ADP至关重要。根据这些结果,建议在DC和平均成本(AC)准则下针对较大的RLM模型扩展Actor-Critic。提出的方法的表述基于RLM系统作为具有任意数量的单指数服务器和二进制控件的模型的表示,这可以看作是先前研究的简单RLM模型的抽象。然后,所提出的模型适合于均匀化过程的应用,这反过来又可以推导最优方程和条件。这些提供的结构属性也有助于在提出的ADP算法中定义或实现控件或参与者。例如,在AC标准下优化作业排序操作的众多模拟实验中都使用了所谓的Intel Mini-Fab模型。这些实验将通过拟议的ADP获得的策略的性能与众所周知的调度规则的性能进行了比较。这些实验的结果证明了该方法在不同操作条件下的适用性,包括不同的预防性维护计划,机器中的随机和确定性处理时间以及系统中的不同负载因子。结果还表明,与所考虑的调度规则相比,使用提议的ADP方法获得的策略通常具有良好的性能。而且,结果表明,在给定的操作条件下,ADP生成的策略甚至可以胜过实验中考虑的调度规则。最后,本文还提供了基于仿真的ADP方法对RLM模型中的预防性维护(PM)计划进行优化的实验结果。所提出的方法利用Actor-Critic体系结构和所谓的决策后状态变量方法来定义ADP体系结构中的actor。作为说明性示例,使用Intel Mini-Fab模型进行了仿真实验。这些实验的结果表明,与选定的固定PM策略相比,ADP生成的PM策略能够显着减少平均在制品和平均周期。

著录项

  • 作者

    Ramirez-Hernandez, Jose A.;

  • 作者单位

    University of Cincinnati.;

  • 授予单位 University of Cincinnati.;
  • 学科 Engineering Electronics and Electrical.;Operations Research.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 210 p.
  • 总页数 210
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号