首页> 外文期刊>Real-time systems >DeepRT: predictable deep learning inference for cyber-physical systems
【24h】

DeepRT: predictable deep learning inference for cyber-physical systems

机译:DeepRT:针对网络物理系统的可预测的深度学习推理

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, in mobile and embedded devices, deep learning is changing the way computers see, hear, and understand the world. When deep learning is deployed to such systems, they are supposed to perform inference tasks in a timely and energy-efficient manner. Lots of research has focused on taming deep learning for resource-constrained devices by either compressing deep learning models or devising hardware accelerators. However, these approaches have focused on providing best-effort' performance for such devices. In this paper, we present the design and implementation of DeepRT, a novel deep learning inference runtime. Unlike previous approaches, DeepRT focuses on supporting predictable temporal and spatial inference performance when deep learning models are used under unpredictable and resource-constrained environments. In particular, DeepRT applies formal control theory to support Quality-of-Service (QoS) management that can dynamically minimize the tardiness of inference tasks at runtime while achieving high energy-efficiency. Further, DeepRT determines a proper level of compression of deep learning models at runtime according to the memory availability and users' QoS requirements, resulting in proper trade-offs between the memory savings and the losses of inference accuracy. We evaluate DeepRT on a wide range of deep learning models under various conditions. The experimental results show that DeepRT supports the timeliness of inference tasks in a robust and energy-efficient manner.
机译:最近,在移动和嵌入式设备中,深度学习正在改变计算机查看,收听和理解世界的方式。当将深度学习部署到此类系统时,它们应该及时,高效地执行推理任务。许多研究都集中在通过压缩深度学习模型或设计硬件加速器来驯服资源受限设备的深度学习。但是,这些方法集中于为此类设备提供“尽力而为”的性能。在本文中,我们介绍了DeepRT的设计和实现,DeepRT是一种新颖的深度学习推理运行时。与以前的方法不同,当在不可预测且资源受限的环境中使用深度学习模型时,DeepRT专注于支持可预测的时间和空间推断性能。尤其是,DeepRT应用形式控制理论来支持服务质量(QoS)管理,该服务可以在运行时动态最小化推理任务的延迟,同时实现高能效。此外,DeepRT会根据内存可用性和用户的QoS要求在运行时确定深度学习模型的适当压缩级别,从而在内存节省和推理精度损失之间进行适当权衡。我们在各种条件下,根据各种深度学习模型评估DeepRT。实验结果表明,DeepRT以可靠且节能的方式支持推理任务的及时性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号