首页> 外文会议>Annual Meeting of the Association for Computational Linguistics >A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation
【24h】

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

机译:一种基于逆向数据标注的问答语料库生成方法

获取原文

摘要

In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of query tokens to OT operations. In our method, we randomly generate OTs from a context-free grammar. Afterwards, an-notators have to write the appropriate natural language question that is represented by the OT. Finally, the annotators assign the tokens to the OT operations. We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases. We compare OTTA to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our corpus is a challenging dataset and that the token alignment can be leveraged to increase the performance significantly.
机译:在本文中,我们介绍了一种新的方法来有效地构建结构化数据问答语料库。为此,我们引入了一种中间表示,它基于数据库中名为操作树(Operation Trees,OT)的逻辑查询计划。这种表示允许我们反转注释过程,而不会失去生成的查询类型的灵活性。此外,它允许查询标记与OT操作进行细粒度的对齐。在我们的方法中,我们从上下文无关的语法中随机生成OTs。之后,公证人必须写出由OT代表的适当的自然语言问题。最后,注释者将标记分配给OT操作。我们应用该方法创建了一个新的语料库OTTA(Operation Trees and Token Assignment),这是一个大型语义分析语料库,用于评估数据库的自然语言接口。我们将OTTA与Spider和LC QuaD 2.0进行了比较,结果表明,我们的方法在保持查询复杂性的同时,注释速度提高了三倍多。最后,我们在我们的数据上训练了一个最先进的语义分析模型,并表明我们的语料库是一个具有挑战性的数据集,可以利用标记对齐来显著提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号