Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling

机译：通过非单调学习率调度对更平坦的损失表面

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Whereas optimizing deep neural networks using stochastic gradient descent has shown great performances in practice, the rule for setting step size (i.e. learning rate) of gradient descent is not well studied. Although it appears that some intriguing learning rate rules such as ADAM (Kingma and Ba, 2014) have since been developed, they concentrated on improving convergence, not on improving generalization capabilities. Recently, the improved generalization property of the flat minima was revisited, and this research guides us towards promising solutions to many current optimization problems. In this paper, we analyze the flatness of loss surfaces through the lens of robustness to input perturbations and advocate that gradient descent should be guided to reach flatter region of loss surfaces to achieve generalization. Finally, we suggest a learning rate rule for escaping sharp regions of loss surfaces, and we demonstrate the capacity of our approach by performing numerous experiments.

机译：然而，使用随机梯度下降优化深神经网络在实践中表现出很大的性能，因此研究了梯度下降的步长（即学习率）的规则。虽然似乎已经开发出一些有趣的学习率规则，如亚当（Kingma和Ba，2014年），它们集中在改善收敛方面，而不是提高泛化能力。最近，重新审视了扁平最小值的改进的普遍性，并且本研究指导了我们对许多当前优化问题的有前途的解决方案。在本文中，我们通过对输入扰动的稳健性镜头分析损耗表面的平坦度，并倡导应引导梯度下降以达到损耗表面的平坦区域以实现泛化。最后，我们建议逃避损失表面尖锐区域的学习率规则，我们通过执行众多实验来证明我们方法的能力。

著录项

来源
《Conference on Uncertainty in Artificial Intelligence》|2018年|540-1072p|共11页
会议地点
作者
Sihyeon Seong; Yekang Lee; Youngwook Kee; Dongyoon Han; Junmo Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Heterogeneous Loss of Gas-Phase Ozone on n-Hexane Soot Surfaces: Similar Kinetics to Loss on Other Chemically Unsaturated Solid Surfaces [J] . J. McCabe, J. P. D. Abbatt The journal of physical chemistry, C. Nanomaterials and interfaces . 2009,第6期

机译：正己烷烟灰表面上气相臭氧的非均相流失：与其他化学不饱和固体表面上的流失动力学相似
2. A nonmonotonic dependence of standard rate constant on reorganization energy for heterogeneous electron transfer processes on electrode surface [J] . Xu WL, Li ST, Zhou XC, The Journal of Chemical Physics . 2006,第17期

机译：电极表面上异质电子转移过程中标准速率常数对重组能量的非单调依赖性
3. Strong Convex Loss Can Increase the Learning Rates of Online Learning [J] . Baohuai Sheng1, Liqin Duan2, Peixin Ye3 Journal of Computers . 2014,第7期

机译：强大的凸损可以增加在线学习的学习率
4. Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling [C] . Sihyeon Seong, Yekang Lee, Youngwook Kee, Conference on Uncertainty in Artificial Intelligence . 2018

机译：通过非单调学习率调度对更平坦的损失表面
5. Continuous napl loss rates using subsurface temperatures [D] . Stockwell, Emily Beth 2015

机译：使用地下温度连续降低napl
6. An Integrated Model for Patient Care and Clinical Trials (IMPACT) to Support Clinical Research Visit Scheduling Workflow for Future Learning Health Systems [O] . Chunhua Weng, Yu Li, Solomon Berhe, -1

机译：病人护理和临床试验的集成模型（IMPACT）支持临床学习访问计划工作流程以用于未来学习型卫生系统
7. Auto-Ensemble: An Adaptive Learning Rate Scheduling Based Deep Learning Model Ensembling [O] . Jun Yang, Fei Wang 2020

机译：自动集合：基于自适应学习速率调度的深度学习模型集合
8. Learning to integrate reactivity and deliberation in uncertain planning and scheduling problems [R] . Chien, Steve A., Gervasio, Melinda T., Dejong, Gerald F. 1992

机译：学习在不确定的计划和调度问题中整合反应性和审议

Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling

摘要

著录项

相似文献

相关主题

期刊订阅