首页> 外文期刊>Journal of Information Recording >Self-Supervised Synonym Extraction from the Web
【24h】

Self-Supervised Synonym Extraction from the Web

机译:从网上进行自我监督的同义词提取

获取原文
获取原文并翻译 | 示例
           

摘要

Current synonym extraction methods work in a "closed" way. Given the problem word and set of target words, researchers have to choose words synonymous with the problem word using features such as lexical patterns and distributional similarities. This paper tries to discover synonyms in an "open" way and presents a synonym extraction framework based on self-supervised learning. We first analysis the nature of the open method and argue that a trained pattern-independent model for synonym extraction is feasible. We then model the extraction of synonyms from sentences as a sequential labeling problem and automatically generate labeled training samples by using structured knowledge from online encyclopedias and some generic heuristic rules. Finally, we train some Conditional Random Field (CRF) models and use them to extract synonyms from the web. We successfully extract more than 20 million facts, which contain 826,219 distinct pairs of synonyms.
机译:当前的同义词提取方法以“封闭”方式工作。给定问题词和目标词集,研究人员必须使用词汇模式和分布相似性等特征选择与问题词同义的词。本文试图以“开放”的方式发现同义词,并提出了一种基于自我监督学习的同义词提取框架。我们首先分析了开放方法的性质,并认为一种经过训练的与模式无关的同义词提取模型是可行的。然后,我们将句子中同义词的提取建模为顺序标签问题,并使用在线百科全书中的结构化知识和一些通用启发式规则自动生成标签训练样本。最后,我们训练了一些条件随机场(CRF)模型,并使用它们从网络中提取同义词。我们成功地提取了超过2000万个事实,其中包含826,219对不同的同义词。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号