首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
【24h】

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

机译:超越有限层神经网络:桥接深层架构和数值微分方程

获取原文
           

摘要

Deep neural networks have become the state-of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress (>50%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.
机译:深度神经网络已成为众多机器学习任务中的最新模型。但是,仍然缺少有关网络体系结构设计的一般指导。在我们的工作中,我们将深度神经网络设计与数值微分方程联系起来。我们表明,许多有效的网络,例如ResNet,PolyNet,FractalNet和RevNet,都可以解释为微分方程的不同数值离散化。这一发现为我们带来了关于有效深度架构设计的全新视角。我们可以利用数值分析中的丰富知识来指导我们设计新的和可能更有效的深度网络。作为示例,我们提出了一种线性多步骤体系结构(LM-architecture),该体系结构受线性多步方法求解常微分方程的启发。 LM体系结构是一种有效的结构,可以在任何类似ResNet的网络上使用。特别是,我们证明了LM-ResNet和LM-ResNeXt(即分别通过在ResNet和ResNeXt上应用LM体系结构而获得的网络)在CIFAR和ImageNet上均具有与ResNet和ResNeXt相当高的精度,并且具有相当数量的训练参数。特别是,在CIFAR和ImageNet上,LM-ResNet / LM-ResNeXt可以显着压缩(> 50%)原始网络,同时保持类似的性能。可以使用数值分析中修正方程的概念以数学方式进行解释。最后但并非最不重要的一点是,我们还在训练过程中建立了随机控制与噪声注入之间的联系,这有助于提高网络的通用性。此外,通过将随机训练策略与动态系统联系起来,我们可以轻松地将随机训练应用于具有LM架构的网络。例如,我们向LM-ResNet引入了随机深度,并且比CIFAR10上的原始LM-ResNet有了显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号