首页> 外文会议>International Joint Conference on Neural Networks >Not All Synonyms Are Created Equal: Incorporating Similarity of Synonyms to Enhance Word Embeddings
【24h】

Not All Synonyms Are Created Equal: Incorporating Similarity of Synonyms to Enhance Word Embeddings

机译:并非所有同义词都相等:创建同义词相似度以增强单词嵌入

获取原文

摘要

Traditional word embedding approaches learn semantic information from the associated contexts of words on large unlabeled corpora, which ignores a fact that synonymy between words happens often within different contexts in a corpus, so this relationship will not be well embedded into vectors. Furthermore, existing synonymy-based models directly incorporate synonyms to train word embeddings, but still neglect the similarity between words and corresponding synonyms. In this paper, we explore a novel approach that employs the similarity between words and corresponding synonyms to train and enhance word embeddings. To this purpose, we build two Synonymy Similarity Models (SSMs), named SSM-W and SSM-M respectively, which adopt different strategies to incorporate the similarity between words and corresponding synonyms during the training process. We evaluated our models for both Chinese and English. The results demonstrate that our models outperform the baselines on seven word similarity datasets. For the analogical reasoning and text classification tasks, our models also surpass all the baselines including a synonymy-based model.
机译:传统的词嵌入方法是从大型未标记的语料库上的词的关联上下文中学习语义信息的,这忽略了一个事实,即词之间的同义词经常发生在语料库的不同上下文中,因此这种关系将无法很好地嵌入向量中。此外,现有的基于同义词的模型直接结合了同义词来训练单词嵌入,但是仍然忽略了单词与相应同义词之间的相似性。在本文中,我们探索了一种新颖的方法,该方法利用单词和相应同义词之间的相似性来训练和增强单词嵌入。为此,我们建立了两个同义词相似度模型(SSM),分别命名为SSM-W和SSM-M,它们在训练过程中采用了不同的策略来融合单词和相应同义词之间的相似性。我们针对中文和英文评估了我们的模型。结果表明,我们的模型在七个词相似性数据集上的表现优于基线。对于类比推理和文本分类任务,我们的模型还超过了所有基线,包括基于同义词的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号