首页> 中文期刊> 《电脑和通信(英文)》 >Online 3D Packing Problem Based on Bi-Value Guidance

Online 3D Packing Problem Based on Bi-Value Guidance

         

摘要

The online 3D packing problem has received increasing attention in recent years due to its practical value. However, the problem itself possesses some peculiar properties, such as sequential decision-making and the large size of the state space, which have made the use of reinforcement learning with Markov decision processes a popular approach for solving this problem. In this paper, we focus on the problem of high variance in value estimation caused by reward uncertainty in the presence of highly uncertain dynamics. To address this, proposed a solution based on auxiliary tasks and intrinsic rewards for the online 3D bin packing problem, guided by a binary-valued network, to assist the agent in learning the policy within the framework of actor-critic deep reinforcement learning. Specifically, the maintenance of two-valued networks and the utilization of multi-valued network estimates are employed to replace the original value estimates, aiming to provide better guidance for the learning of policy networks. Experimentally, it has been demonstrated that our model can achieve more robust learning and outperform previous works in terms of performance.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号