首页> 外文会议>International Conference on Artificial Neural Networks >Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning
【24h】

Theoretical Analysis of Function of Derivative Term in On-Line Gradient Descent Learning

机译:在线梯度下降学习中衍生术语功能的理论分析

获取原文

摘要

In on-line gradient descent learning, the local property of the derivative term of the output can slow convergence. Improving the derivative term, such as by using the natural gradient, has been proposed for speeding up the convergence. Beside this sophisticated method, "simple method" that replace the derivative term with a constant has proposed and showed that this greatly increases convergence speed. Although this phenomenon has been analyzed empirically, however, theoretical analysis is required to show its generality. In this paper, we theoretically analyze the effect of using the simple method. Our results show that, with the simple method, the generalization error decreases faster than with the true gradient descent method when the learning step is smaller than optimum value η_(opt). When it is larger than η_(opt), it decreases slower with the simple method, and the residual error is larger than with the true gradient descent method. Moreover, when there is output noise, η_(opt) is no longer optimum; thus, the simple method is not robust in noisy circumstances.
机译:在在线梯度下降学习中,输出的衍生项的本地属性可以慢趋同。已经提出了改进衍生术语,例如通过使用自然梯度,以加速收敛。除了这种复杂的方法旁边,用恒定提出替换衍生术语的“简单方法”,并表明这大大增加了收敛速度。然而,这种现象已经经验分析,但是需要理论分析来展示其一般性。在本文中,我们理论上分析了使用简单方法的效果。我们的结果表明,通过简单的方法,当学习步骤小于最佳值η_(OPT)时,泛化误差比使用真实梯度下降方法更快。当它大于η_(opt)时,通过简单的方法减少较慢,并且剩余误差大于真正梯度下降方法。此外,当出现输出噪声时,η_(opt)不再最佳;因此,在嘈杂的情况下,简单的方法在不稳定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号