Autoregressive Knowledge Distillation through Imitation Learning

机译：通过仿制学习归类知识蒸馏

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of autoregressive models on natural language generation tasks has dramatically improved due to the adoption of deep, self-attentive architectures. However, these gains have come at the cost of hindering inference speed, making state-of-the-art models cumbersome to deploy in real-world, time-sensitive settings. We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation. The algorithm is designed to address the exposure bias problem. On prototypical language generation tasks such as translation and summarization, our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation. Student models trained with our method attain 1.4 to 4.8 BLEU/ROUGE points higher than those trained from scratch, while increasing inference speed by up to 14 times in comparison to the teacher model.

机译：由于采用了深度，自定义的架构，自然语言生成任务对自然语言生成任务的绩效显着改善。然而，这些收益具有阻碍推理速度的成本，使最先进的模型繁琐地部署在现实世界，时间敏感的环境中。我们为自动评级模型开发了一种压缩技术，这些模型是由知识蒸馏的模仿学习视角驱动的。该算法旨在解决曝光偏置问题。在翻译和摘要等原型语言生成任务中，我们的方法始终如不地优于其他蒸馏算法，例如序列级知识蒸馏。使用我们的方法培训的学生模型比从头开始训练的人培训1.4至4.8个BLEU / RUGE点，同时与教师模型相比，推理速度提高了14倍。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|6121-6133|共13页
会议地点
作者
Alexander Lin; Jeremy Wohlwend; Howard Chen; Tao Lei;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning to learn by yourself: Unsupervised meta-learning with self-knowledge distillation for COVTD-19 diagnosis from pneumonia cases [J] . Wenbo Zheng, Lan Yan, Chao Gou, International Journal of Intelligent Systems . 2021,第8期

机译：学习自我学习：无监督的元学习，具有自我知识蒸馏，对肺炎病例的CoVTD-19诊断
2. An approach to the diffusion of tacit knowledge: learning from imitation to creation [J] . International journal of knowledge management studies . 2020,第2期

机译：隐性知识传播的一种方法：从模仿到创造
3. Learning, imitation, and the use of knowledge: A comparison of markets, hierarchies, and teams [J] . John C. Butler, Jovan Grahovac Operations Research . 2013,第3期

机译：学习，模仿和知识使用：市场，层次结构和团队的比较
4. A New Autoregressive Neural Network Model with Command Compensation for Imitation Learning Based on Bilateral Control [C] . Kazuki Hayashi, Ayumu Sasagawa, Sho Sakaino, IEEE International Conference on Mechatronics . 2021

机译：基于双边控制的模仿学习命令补偿新的自回归神经网络模型
5. Learning to search: Structured prediction techniques for imitation learning. [D] . Ratliff, Nathan D. 2009

机译：学习搜索：模仿学习的结构化预测技术。
6. Knowledge distillation in deep learning and its applications [O] . Abdolmaged Alkhulaifi, Fahad Alsahli, Irfan Ahmad 2021

机译：深度学习知识蒸馏及其应用
7. Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection [O] . Jian Liu, Yubo Chen, Kang Liu 2019

机译：利用地面真理：事件检测的基于对侵犯模仿的知识蒸馏方法

Autoregressive Knowledge Distillation through Imitation Learning

摘要

著录项

相似文献

相关主题

期刊订阅