首页> 外文会议>Conference on Uncertainty in Artificial Intelligence >Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling
【24h】

Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling

机译:通过非单调学习率调度对更平坦的损失表面

获取原文

摘要

Whereas optimizing deep neural networks using stochastic gradient descent has shown great performances in practice, the rule for setting step size (i.e. learning rate) of gradient descent is not well studied. Although it appears that some intriguing learning rate rules such as ADAM (Kingma and Ba, 2014) have since been developed, they concentrated on improving convergence, not on improving generalization capabilities. Recently, the improved generalization property of the flat minima was revisited, and this research guides us towards promising solutions to many current optimization problems. In this paper, we analyze the flatness of loss surfaces through the lens of robustness to input perturbations and advocate that gradient descent should be guided to reach flatter region of loss surfaces to achieve generalization. Finally, we suggest a learning rate rule for escaping sharp regions of loss surfaces, and we demonstrate the capacity of our approach by performing numerous experiments.
机译:然而,使用随机梯度下降优化深神经网络在实践中表现出很大的性能,因此研究了梯度下降的步长(即学习率)的规则。虽然似乎已经开发出一些有趣的学习率规则,如亚当(Kingma和Ba,2014年),它们集中在改善收敛方面,而不是提高泛化能力。最近,重新审视了扁平最小值的改进的普遍性,并且本研究指导了我们对许多当前优化问题的有前途的解决方案。在本文中,我们通过对输入扰动的稳健性镜头分析损耗表面的平坦度,并倡导应引导梯度下降以达到损耗表面的平坦区域以实现泛化。最后,我们建议逃避损失表面尖锐区域的学习率规则,我们通过执行众多实验来证明我们方法的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号