首页> 外文期刊>Neurocomputing >Theoretical analysis of batch and on-line training for gradient descent learning in neural networks
【24h】

Theoretical analysis of batch and on-line training for gradient descent learning in neural networks

机译:神经网络中梯度下降学习的批量和在线训练的理论分析

获取原文
获取原文并翻译 | 示例
           

摘要

In this study, we theoretically analyze two essential training schemes for gradient descent learning in neural networks: batch and on-line training. The convergence properties of the two schemes applied to quadratic loss functions are analytically investigated. We quantify the convergence of each training scheme to the optimal weight using the absolute value of the expected difference (Measure 1) and the expected squared difference (Measure 2) between the optimal weight and the weight computed by the scheme. Although on-line training has several advantages over batch training with respect to the first measure, it does not converge to the optimal weight with respect to the second measure if the variance of the per-instance gradient remains constant. However, if the variance decays exponentially, then online training converges to the optimal weight with respect to Measure 2. Our analysis reveals the exact degrees to which the training set size, the variance of the per-instance gradient, and the learning rate affect the rate of convergence for each scheme.
机译:在这项研究中,我们从理论上分析了神经网络中梯度下降学习的两种基本训练方案:批量训练和在线训练。分析了应用于二次损失函数的两种方案的收敛性质。我们使用最佳权重和该方案计算的权重之间的期望差的绝对值(度量1)和期望平方差(度量2)的绝对值,量化每个训练方案对最优权重的收敛。尽管就第一种方法而言,在线训练比批量训练具有若干优势,但如果按实例坡度的方差保持恒定,则相对于第二种方法,在线训练不会收敛到最佳权重。但是,如果方差呈指数衰减,则在线训练将相对于度量2收敛到最佳权重。我们的分析揭示了训练集大小,每个实例梯度的方差和学习率会影响训练的确切程度。每个方案的收敛速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号