首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >Autoregressive Knowledge Distillation through Imitation Learning
【24h】

Autoregressive Knowledge Distillation through Imitation Learning

机译:通过仿制学习归类知识蒸馏

获取原文

摘要

The performance of autoregressive models on natural language generation tasks has dramatically improved due to the adoption of deep, self-attentive architectures. However, these gains have come at the cost of hindering inference speed, making state-of-the-art models cumbersome to deploy in real-world, time-sensitive settings. We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation. The algorithm is designed to address the exposure bias problem. On prototypical language generation tasks such as translation and summarization, our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation. Student models trained with our method attain 1.4 to 4.8 BLEU/ROUGE points higher than those trained from scratch, while increasing inference speed by up to 14 times in comparison to the teacher model.
机译:由于采用了深度,自定义的架构,自然语言生成任务对自然语言生成任务的绩效显着改善。然而,这些收益具有阻碍推理速度的成本,使最先进的模型繁琐地部署在现实世界,时间敏感的环境中。我们为自动评级模型开发了一种压缩技术,这些模型是由知识蒸馏的模仿学习视角驱动的。该算法旨在解决曝光偏置问题。在翻译和摘要等原型语言生成任务中,我们的方法始终如不地优于其他蒸馏算法,例如序列级知识蒸馏。使用我们的方法培训的学生模型比从头开始训练的人培训1.4至4.8个BLEU / RUGE点,同时与教师模型相比,推理速度提高了14倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号