Comparing Word Relatedness Measures Based on Google n-grams

机译：比较基于Google n-gram的单词相关性度量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Estimating word relatedness is essential in natural language processing (NLP), and in many other related areas. Corpus-based word relatedness has its advantages over knowledge-based supervised measures. There are many corpus-based measures in the literature that can not be compared to each other as they use a different corpus. The purpose of this paper is to show how to evaluate different corpus-based measures of word relatedness by calculating them over a common corpus (i.e., the Google n-grams) and then assessing their performance with respect to gold standard relatedness datasets. We evaluate six of these measures as a starting point, all of which are re-implemented using the Google n-gram corpus as their only resource, by comparing their performance in five different data sets. We also show how a word relatedness measure based on a web search engine can be implemented using the Google n-gram corpus.

机译：估计单词相关性在自然语言处理（NLP）和许多其他相关领域中至关重要。与基于知识的监督措施相比，基于语料库的单词相关性具有其优势。文献中有许多基于语料库的度量，因为它们使用了不同的语料库，所以无法相互比较。本文的目的是展示如何通过在一个通用语料库（即Google n-gram）上计算基于单词的不同语料的度量，然后评估它们在黄金标准相关性数据集方面的表现。我们评估了其中六个指标作为起点，通过比较它们在五个不同数据集中的效果，将所有这些指标都以Google n-gram语料库作为其唯一资源来重新实施。我们还将展示如何使用Google n-gram语料库实现基于Web搜索引擎的单词相关性度量。

著录项

来源
《International conference on computational linguistics》|2012年|495-506|共12页
会议地点
作者
Aminul ISLAM; Evangelos MILIOS; Vlado KESELJ;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Word Relatedness; Similarity; Corpus; Unsupervised; Google n-grams; Digrams;

机译：词相关性相似;语料库;无监督; Google n-gram;图;

相似文献

外文文献
中文文献
专利

1. Measuring Word Semantic Relatedness Using WordNet-Based Approach [J] . Tingting Wei1, Huiyou Chang2 Journal of Computers . 2015,第4期

机译：使用基于Wordnet的方法测量词语语义相关性
2. Measuring Author Research Relatedness: A Comparison of Word-Based, Topic-Based, and Author Cocitation Approaches [J] . Kun Lu, Dietmar Wolfram Journal of the American Society for Information Science and Technology . 2012,第10期

机译：评估作者研究的相关性：基于单词，基于主题和作者引用方法的比较
3. The TF-IDF measure and analysis of links between words within N-grams in the formation of knowledge units for open tests [J] . G. M. Emelyanov, D. V. Mikhailov, A. P. Kozlov Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2017,第4期

机译：N-GRAM内单词与开放式测试的知识单元中单词之间的链路的TF-IDF测量和分析
4. Comparing Word Relatedness Measures Based on Google n-grams [C] . Aminul ISLAM, Evangelos MILIOS, Vlado KESELJ International conference on computational linguistics . 2012

机译：基于Google n-grams的词汇相关措施比较
5. Micro-AIRS: A microcomputer-based Arabic information retrieval system comparing words, stems, and roots as index terms. [D] . Al-Kharashi, Ibrahim A. 1991

机译：Micro-AIRS：一种基于微机的阿拉伯语信息检索系统，用于比较单词，词根和词根作为索引词。
6. Words prediction based on N-gram model for free-text entry in electronic health records [O] . Azita Yazdani, Reza Safdari, Ali Golkar, 2019

机译：基于N-GRAM模型的电子健康记录中自由文本输入的单词预测
7. Measuring Word Semantic Relatedness Using WordNet-Based Approach [O] . Tingting Wei, Huiyou Chang 2015

机译：使用基于Wordnet的方法测量词语语义相关性

Comparing Word Relatedness Measures Based on Google n-grams

摘要

著录项

相似文献

相关主题

期刊订阅