首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers
【24h】

Removing the Training Wheels: A Coreference Dataset that Entertains Humans and Challenges Computers

机译:卸下训练轮:娱乐人类并挑战计算机的共参考数据集

获取原文

摘要

Coreference is a core NLP problem. However, newswire data, the primary source of existing coreference data, lack the richness necessary to truly solve coreference. We present a new domain with denser references-quiz bowl questions-that is challenging and enjoyable to humans, and we use the quiz bowl community to develop a new coreference dataset, together with an annotation framework that can tag any text data with coreferences and named entities. We also successfully integrate active learning into this annotation pipeline to collect documents maximally useful to coreference models. State-of-the-art coreference systems underperform a simple classifier on our new dataset, motivating non-newswire data for future coreference research.
机译:共指是NLP的核心问题。但是,新闻电报数据是现有共同引用数据的主要来源,缺乏真正解决共同引用所需的丰富性。我们提出了一个新的领域,其中包含了更密集的引用(测验碗问题),这对人类来说是具有挑战性和令人愉悦的,并且我们使用测验碗社区来开发新的共同引用数据集,以及可以使用共同引用标记任何文本数据并命名的注释框架实体。我们还将成功的学习成功地集成到该注释管道中,以收集对共参考模型最大有用的文档。先进的共同参照系统在我们的新数据集上的表现不及简单分类器,从而激发了非新闻界数据的未来共同参照研究的动力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号