首页> 外文会议>International conference on computational linguistics >Comparing Word Relatedness Measures Based on Google n-grams
【24h】

Comparing Word Relatedness Measures Based on Google n-grams

机译:比较基于Google n-gram的单词相关性度量

获取原文

摘要

Estimating word relatedness is essential in natural language processing (NLP), and in many other related areas. Corpus-based word relatedness has its advantages over knowledge-based supervised measures. There are many corpus-based measures in the literature that can not be compared to each other as they use a different corpus. The purpose of this paper is to show how to evaluate different corpus-based measures of word relatedness by calculating them over a common corpus (i.e., the Google n-grams) and then assessing their performance with respect to gold standard relatedness datasets. We evaluate six of these measures as a starting point, all of which are re-implemented using the Google n-gram corpus as their only resource, by comparing their performance in five different data sets. We also show how a word relatedness measure based on a web search engine can be implemented using the Google n-gram corpus.
机译:估计单词相关性在自然语言处理(NLP)和许多其他相关领域中至关重要。与基于知识的监督措施相比,基于语料库的单词相关性具有其优势。文献中有许多基于语料库的度量,因为它们使用了不同的语料库,所以无法相互比较。本文的目的是展示如何通过在一个通用语料库(即Google n-gram)上计算基于单词的不同语料的度量,然后评估它们在黄金标准相关性数据集方面的表现。我们评估了其中六个指标作为起点,通过比较它们在五个不同数据集中的效果,将所有这些指标都以Google n-gram语料库作为其唯一资源来重新实施。我们还将展示如何使用Google n-gram语料库实现基于Web搜索引擎的单词相关性度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号