首页> 外文会议>9th International conference on language resources and evaluation >Extrinsic Corpus Evaluation with a Collocation Dictionary Task
【24h】

Extrinsic Corpus Evaluation with a Collocation Dictionary Task

机译:与搭配字典任务的外在语料库评估

获取原文

摘要

The NLP researcher or application-builder often wonders "what corpus should I use, or should I build one of my own? If I build one of my own, how will I know if I have done a good job?" Currently there is very little help available for them. They are in need of a framework for evaluating corpora. We develop such a framework, in relation to corpora which aim for good coverage of 'general language'. The task we set is automatic creation of a publication-quality collocations dictionary. For a sample of 100 headwords of Czech and 100 of English, we identify a gold standard dataset of (ideally) all the collocations that should appear for these headwords in such a dictionary. The datasets are being made available alongside this paper. We then use them to determine precision and recall for a range of corpora, with a range of parameters.
机译:NLP研究人员或应用程序 - 建设者经常出现奇迹“我应该使用什么语料库,或者我应该建立我自己的一个?如果我建造我自己的一个,我将如何知道我做得很好吗?”目前对他们来说有很少的帮助。他们需要一个评估Corpora的框架。我们培养了这样一个框架,与Corpora有关,旨在良好地覆盖“普通语言”。我们设置的任务是自动创建出版物质量搭配字典。对于捷克和100英文的100个字词的样本,我们识别(理想情况下)所有展示的金标准数据集,这些展示在这样的字典中应该出现在这样的字典中。数据集正在与本文一起提供。然后,我们使用它们来确定一系列语料库的精度并回忆,具有一系列参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号