首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >Intelligent Decision-Making of Scheduling for Dynamic Permutation Flowshop via Deep Reinforcement Learning
【2h】

Intelligent Decision-Making of Scheduling for Dynamic Permutation Flowshop via Deep Reinforcement Learning

机译:通过深度加强学习来调度动态排列流程的智能决策

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.
机译:由于其实际影响,近年来,动态调度问题一直在受到越来越关注。为了实现实时和动态调度的智能决策,我们研究了使用深度加强学习(DRL)的新作业到达动态置换流程调度问题(PFSP)。提出了一种用于使用DRL求解动态PFSP的系统架构,并建立了最小化总迟到成本的数学模型。此外,基于DRL的智能调度系统被建模,具有所设计的状态,操作和奖励。此外,优势演员 - 评论家(A2C)算法适于训练调度剂。学习曲线表示调度代理学会在训练期间有效地生成更好的解决方案。进行了广泛的实验,以将基于A2C的调度剂与每个动作,其他DRL算法和元启发式进行比较。结果显示了考虑解决方案质量,CPU次数和泛化的A2C的调度剂的井性能。值得注意的是,培训的代理仅在2.16毫秒平均生成调度动作,这几乎是即时的,并且可以用于实时调度。我们的工作可以帮助建立自学,实时优化和智能决策调度系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号