Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

机译：学习解释：数据集和模型，用于识别多跳题问答中的有效推理链

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite the rapid progress in multihop question-answering (QA), models still have trouble explaining why an answer is correct, with limited explanation training data available to learn from. To address this, we introduce three explanation datasets in which explanations formed from corpus facts are annotated. Our first dataset, eQASC, contains over 98K explanation annotations for the multihop question answering dataset QASC, and is the first that annotates multiple candidate explanations for each answer. The second dataset eQASC-perturbed is constructed by crowd-sourcing perturbations (while preserving their validity) of a subset of explanations in QASC, to test consistency and generalization of explanation prediction models. The third dataset eOBQA is constructed by adding explanation annotations to the OBQA dataset to test generalization of models trained on eQASC. We show that this data can be used to significantly improve explanation quality (+14% absolute Fl over a strong retrieval baseline) using a BERT-based classifier, but still behind the upper bound, offering a new challenge for future research. We also explore a delexicalized chain representation in which repeated noun phrases are replaced by variables, thus turning them into generalized reasoning chains (for example: "X is a Y" AND "Y has Z" IMPLIES "X has Z"). We find that generalized chains maintain performance while also being more robust to certain perturbations.

机译：尽管有多跳问题回答（QA）的进展迅速，但模型仍然有问题解释了答案是正确的，有限的解释培训数据可以从中学习。为了解决这个问题，我们介绍了三个解释数据集，其中由语料库事实形成的解释是注释的。我们的第一个DataSet EQSAC，包含超过98k的解释注释，用于多跳问题应答DataSet QUAC，并且是为每个答案注释多个候选解释的第一个解释。第二个数据集Eqasc-erburbed是通过人群采购的扰动（同时保持其有效性）在Qasc中的解释子集中构成，以测试解释预测模型的一致性和泛化。第三个数据集Eobqa是通过向OBQA数据集添加说明注释来构建，以测试EQASC上培训的模型的概括。我们表明，使用基于伯爵的分类器，可以使用该数据来显着提高解释质量（+ 14％的绝对流动），但仍然落后于上限，为未来的研究提供了新的挑战。我们还探讨了一个不同的链式表示，其中重复的名词短语被变量替换，从而将它们转换为概括的推理链（例如：“x是Y”，“Y具有Z”表示“x具有”）。我们发现广义链保持性能，同时对某些扰动也更加强大。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing》|2020年|137-150|共14页
会议地点
作者
Harsh Jhamtani; Peter Clark;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare [J] . Byeong Ho Kang, Eui-Nam Huh, Muhammad Hameed Siddiqi, Sensors . 2015,第7期

机译：GUDM：自动生成用于医疗保健学习和推理的统一数据集
2. Additional evidence for a dual-strategy model of reasoning: Probabilistic reasoning is more invariant than reasoning about logical validity [J] . Markovits Henry, Brisson Janie, de Chantal Pier-Luc Memory & cognition . 2015,第8期

机译：双重策略推理模型的其他证据：概率推理比逻辑有效性推理更具不变性
3. Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets [J] . Marc-Andre Schulz, B. T. Thomas Yeo, Joshua T. Vogelstein, Nature Communications . 2020,第1期

机译：UKBIOBANK大脑图像与机器学习数据集不同的线性模型和深度学习的不同缩放
4. Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference [C] . Ameya Godbole, Rajarshi Das, Manzil Zaheer, Workshop on graph-based methods for natural language processing . 2019

机译：TextGraphs 2019共享任务的推理链：事实链推理可解释的多跳推理
5. Identifying and Modeling Spatio-temporal Structures in High Dimensional Climate and Weather Datasets with Applications to Water and Energy Resource Management [D] . Farnham, David J. 2018

机译：在高维气候和天气数据集中识别和建模时空结构及其在水资源和能源资源管理中的应用
6. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare [O] . Rahman Ali, Muhammad Hameed Siddiqi, Muhammad Idris, 2015

机译：GUDM：自动生成用于医疗保健学习和推理的统一数据集
7. Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference [O] . Rajarshi Das, Ameya Godbole, Manzil Zaheer, 2019

机译：Textographs 2019年分享任务的推理：推理事实的链条，可解释的多跳推理

Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering

摘要

著录项

相似文献

相关主题

期刊订阅