REALM: Retrieval-Augmented Language Model Pre-Training

机译：王牌：检索 - 增强语言模型预培训

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

机译：语言模型预培训已被证明捕捉到令人惊讶的世界知识，对于NLP任务，如问题应答。然而，这种知识在神经网络的参数中隐含地存储，需要更大的网络来涵盖更多事实。以一种更模块化和可解释的方式捕获知识，我们使用潜在知识猎犬增加语言模型预培训，这允许模型从预训练期间使用的大型语料库中获取和参加文档，这些模型调谐和推理。我们首次展示如何以无监督的方式预先培训这种知识猎犬，使用屏蔽语言建模作为学习信号，并通过考虑数百万文档的检索步骤来反向。我们展示了通过对开放式域问题回答（Open-QA）的具有挑战性任务进行微调的检索增强语言模型预培训（REARM）的有效性。我们在三个流行的开放式基准测试中与最先义的模型进行比较，并发现我们以前的所有方法优于所有先前的方法（绝对精度为4-16％），同时也提供诸如解释性和模块化等定性益处。

著录项

来源
《International Conference on Machine Learning》|2021年|3887-4673p|共10页
会议地点
作者
Kelvin Guu; Kenton Lee; Zora Tung; Panupong Pasupat; Ming-Wei Chang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Summary of Research Methods on Pre-Training Models of Natural Language Processing [J] . Yu Xiao, Zhezhi Jin Open Access Library Journal . 2021,第7期

机译：自然语言处理预培训模型研究方法综述
2. Languages of political support: Engaging with the public realm [J] . Michael Freeden Critical Review of International Social and Political Philosophy . 2009,第2期

机译：政治支持的语言：参与公共领域
3. Pre-training the deep generative models with adaptive hyperparameter optimization [J] . Yao Chengwei, Cai Deng, Bu Jiajun, Neurocomputing . 2017,第JULa19期

机译：通过自适应超参数优化对深度生成模型进行预训练
4. REALM: Retrieval-Augmented Language Model Pre-Training [C] . Kelvin Guu, Kenton Lee, Zora Tung, International Conference on Machine Learning . 2021

机译：王牌：检索 - 增强语言模型预培训
5. Moving college students to a better understanding of substrate specificity of enzymes through utilizing multimedia pre-training and an interactive enzyme model [D] . Saleh, Mounir R. 2015

机译：利用多媒体预训练和交互式酶模型，使大学生对酶的底物特异性有更好的了解
6. Predicting the Disease Risk of Protein Mutation Sequences With Pre-training Model [O] . Kuan Li, Yue Zhong, Xuan Lin, 2020

机译：预测培训前模型蛋白质突变序列的疾病风险
7. A Comprehensive Exploration of Pre-training Language Models [O] . Tong Guo 2021

机译：全面探索预培训语言模型

REALM: Retrieval-Augmented Language Model Pre-Training

摘要

著录项

相似文献

相关主题

期刊订阅