【24h】

Online Distilling from Checkpoints for Neural Machine Translation

机译:从检查点在线提取神经机器翻译

获取原文

摘要

Current predominant neural machine translation (NMT) models often have a deep structure with large amounts of parameters, making these models hard to train and easily suffering from over-fitting. A common practice is to utilize a validation set to evaluate the training process and select the best checkpoint. Average and ensemble techniques on checkpoints can lead to further performance improvement. However, as these methods do not affect the training process, the system performance is restricted to the checkpoints generated in the original training procedure. In contrast, we propose an online knowledge distillation method. Our method on-the-fly generates a teacher model from checkpoints, guiding the training process to obtain better performance. Experiments on several datasets and language pairs show steady improvement over a strong sclf-attention-based baseline system. We also provide analysis on data-limited setting against over-fitting. Furthermore, our method leads to an improvement on a machine reading experiment as well.
机译:当前的主要神经机器翻译(NMT)模型通常具有包含大量参数的深层结构,这使得这些模型难以训练并且容易遭受过度拟合的困扰。一种常见的做法是利用验证集评估培训过程并选择最佳检查点。检查点上的平均和集成技术可以进一步提高性能。但是,由于这些方法不影响训练过程,因此系统性能仅限于原始训练过程中生成的检查点。相比之下,我们提出了一种在线知识提炼方法。我们的实时方法从检查点生成教师模型,指导培训过程以获得更好的表现。在多个数据集和语言对上进行的实验表明,与基于sclf-attention的强大基线系统相比,该产品稳步改进。我们还提供针对数据限制设置的分析,以防过度拟合。此外,我们的方法还导致机器阅读实验的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号