Averaging Weights Leads to Wider Optima and Better Generalization

机译：平均权重导致更广泛的Optima和更好的泛化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

机译：通过优化具有SGD变体的损耗功能，通常通过衰减的学习速率优化深度神经网络，直到收敛。我们表明，沿SGD的轨迹的多个点的简单平均，具有循环或恒定的学习率，导致比传统训练更好地推广。我们还表明，该随机重量平均（SWA）程序比SGD找到了更广泛的Optima，并且近似于近期具有单个模型的快速几何整理（FGE）方法。使用SWA在CIFAR-10，CIFAR-100和ImageNet上的一系列最先进的残余网络，Pyramidnets，Densenets和Shake-Shake网络上，我们对测试精度的测试精度显着提高。简而言之，SWA非常容易实现，改善泛化，几乎没有计算开销。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2018年|540-1072p|共10页
会议地点
作者
Pavel Izmailov; Dmitrii Podoprikhin; Timur Garipov; Dmitry Vetrov; Andrew Gordon Wilson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. On Ordered Weighted Averaging Social Optima [J] . Giuseppe De Marco, Jacqueline Morgan Journal of Optimization Theory and Applications . 2014,第2期

机译：关于有序加权平均社会最优
2. Multichannel generalization of the Upper-Lower Edge Detector using ordered weighted averaging operators [J] . C. Guerra, A. Jurio, H. Bustince, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2014,第3期

机译：使用有序加权平均算子对上下边缘检测器进行多通道概括
3. Bullwhip Effect of Weighted Moving Average Forecast under Stochastic Lead Time [J] . Koichi Nakade, Yuta Aniyama IFAC PapersOnLine . 2019,第13期

机译：随机提前期下加权移动平均线预测的牛鞭效应
4. Averaging Weights Leads to Wider Optima and Better Generalization [C] . Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Conference on Uncertainty in Artificial Intelligence . 2018

机译：平均权重导致更广泛的Optima和更好的泛化
5. Identification of oxygen optima for mouse trophoblast stem cells and human embryos and the stress responses upon departing optima. [D] . Yang, Yu. 2017

机译：鉴定小鼠滋养层干细胞和人类胚胎的最佳氧含量，以及离开最佳值时的应激反应。
6. Correlation between preconception maternal non-occupational exposure to interior decoration or oil paint odour and average birth weight of neonates: findings from a nationwide cohort study in China’s rural areas [O] . Huiting Liu, Shi Chen, Huijuan Zhu, 2017

机译：孕前孕妇非职业性暴露于室内装饰或油漆气味与新生儿平均出生体重之间的相关性：一项来自中国农村地区的全国队列研究的结果
7. SQWA: Stochastic Quantized Weight Averaging For Improving The Generalization Capability Of Low-Precision Deep Neural Networks [O] . Sungho Shin, Yoonho Boo, Wonyong Sung 2021

机译：SQWA：随机量化的重量平均，用于提高低精度深神经网络的泛化能力
8. Multivariate Linear Predictive Spectral Analysis Employing Weighted Forward and Backward Averaging: A Generalization of Burg's Algorithm. [R] . nuttall,albert h. 1976

机译：采用加权前向和后向平均的多元线性预测谱分析：Burg算法的推广。

Averaging Weights Leads to Wider Optima and Better Generalization

摘要

著录项

相似文献

相关主题

期刊订阅