首页> 外文会议>International conference on recent advances in natural language processing >Turning Silver into Gold: Error-Focused Corpus Reannotation with Active Learning
【24h】

Turning Silver into Gold: Error-Focused Corpus Reannotation with Active Learning

机译:将银变成金色:重点击败的语料库术争与主动学习

获取原文

摘要

While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems' performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.
机译:虽然高质量的金标准注释的Corpora对于大多数自然语言处理中的任务至关重要,但近年来发表的许多注释的Corpora由注释器或工具创建,包含嘈杂的注释。即使它们用于评估活动或比较系统的表演,这些语料库可以被视为比黄金标准更多的银牌。由于将银色语料库升级到黄金级别仍然是一项挑战,我们探讨了主动学习技术的应用来使用专为文档分类和语音标记的四个数据集来检测错误。我们的研究结果表明,与随机选择相比,播种步骤的播种步骤的机会提高了2.73倍的机会,从基线方法增加了14.71%。我们的查询方法在随机选择的情况下平均提供1.78倍的误差检测精度的增加,而与其他查询方法相比增加了61.82%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号