首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >EdgeL^3: Compressing L^3-Net for Mote Scale Urban Noise Monitoring
【24h】

EdgeL^3: Compressing L^3-Net for Mote Scale Urban Noise Monitoring

机译:Edgel ^ 3:压缩L ^ 3-net用于电影规模城市噪声监测

获取原文

摘要

Urban noise sensing in deeply embedded devices at the edge of the Internet of Things (IoT) is challenging not only because of the lack of sufficiently labeled training data but also because device resources are quite limited. Look, Listen, and Learn (L3), a recently proposed state-of-the-art transfer learning technique, mitigates the first challenge by training self-supervised deep audio embeddings through binary Audio-Visual Correspondence (AVC), and the resulting embeddings can be used to train a variety of downstream audio classification tasks. However, with close to 4.7 million parameters, the multi-layer L3-Net CNN is still prohibitively expensive to be run on small edge devices, such as "motes" that use a single microcontroller and limited memory to achieve long-lived self-powered operation. In this paper, we comprehensively explore the feasibility of compressing the L3-Net for mote-scale inference. We use pruning, ablation, and knowledge distillation techniques to show that the originally proposed L3-Net architecture is substantially overparameterized, not only for AVC but for the target task of sound classification as evaluated on two popular downstream datasets. Our findings demonstrate the value of fine-tuning and knowledge distillation in regaining the performance lost through aggressive compression strategies. Finally, we present EdgeL3, the first L3-Net reference model compressed by 1-2 orders of magnitude for real-time urban noise monitoring on resource-constrained edge devices, that can fit in just 0.4 MB of memory through half-precision floating point representation.
机译:城市噪音感应在互联网(IOT)边缘的深度嵌入式设备(IOT)不仅是挑战,不仅是因为缺乏足够标记的训练数据,而且因为设备资源非常有限。看,听,学习(L 3 )最近提出的最先进的转移学习技术,通过二进制视听通信(AVC)培训自我监督的深音频嵌入来减轻第一个挑战,并且可以使用所得嵌入的嵌入式来训练各种各样的下游音频分类任务。但是,近470万参数,多层L 3 -NET CNN仍然在小边缘设备上运行仍然非常昂贵,例如使用单个微控制器和有限的内存来实现长期自动操作的“MOTES”。在本文中,我们全面探讨了压缩L的可行性 3 -NET for Mote-Scale推断。我们使用修剪,消融和知识蒸馏技术来表明最初提出的l 3 -NET架构基本上是过度分开的,不仅用于AVC,而且对于Sound分类的目标任务,如在两个流行的下游数据集上评估。我们的研究结果表明了微调和知识蒸馏的价值,以恢复通过侵略性压缩策略损失的性能。最后,我们介绍了Edgel 3 ,第一个l 3 -NET参考模型压缩1-2级,用于资源受限的边缘设备上的实时城市噪声监控,可以通过半精度浮点表示仅适用于0.4 MB的内存。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号