首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Exploiting Common Characters in Chinese and Japanese to Learn Cross-lingual Word Embeddings via Matrix Factorization
【24h】

Exploiting Common Characters in Chinese and Japanese to Learn Cross-lingual Word Embeddings via Matrix Factorization

机译:利用中文和日语的常见字符,通过矩阵分解学习跨语言嵌入式

获取原文

摘要

Learning vector space representation of words (i.e., word embeddings) has recently attracted wide research interests, and has been extended to cross-lingual scenario. Currently most cross-lingual word embedding learning models are based on sentence alignment, which inevitably introduces much noise. In this paper, we show in Chinese and Japanese, the acquisition of semantic relation among words can benefit from the large number of common characters shared by both languages; inspired by this unique feature, we design a method named C.TC targeting to generate cross-lingual context of words. We combine C.TC with GloVe based on matrix factorization, and then propose an integrated model named CJ-Glo. Taking two sentence-aligned models and CJ-BOC (also exploits common characters but is based on CBOW) as baseline algorithms, we compare them with CJ-Glo on a series of NLP tasks including cross-lingual synonym, word analogy and sentence alignment. The result indicates CJ-Glo achieves the best performance among these methods, and is more stable in cross-lingual tasks: moreover, compared with CJ-BOC, CJ-Glo is less sensitive to the alteration of parameters.
机译:学习矢量空间表示单词(即Word Embeddings)最近吸引了广泛的研究兴趣,并且已经扩展到跨语法的情景。目前最跨语言的嵌入学习模型基于句子对齐,这不可避免地引入了很多噪声。在本文中,我们展示了中文和日语,获取语义关系中的语义关系可以受益于两种语言共享的大量常见字符;灵感来自这种独特的功能,我们设计了一个名为C.TC的方法,该方法是生成单词的交叉语言背景。我们将C.TC与Glove相结合,基于矩阵分解,然后提出一个名为CJ-Glo的集成模型。采用两个句子对齐的模型和CJ-BOC(也利用常见字符,而是基于CBOW)作为基线算法,我们将它们与CJ-GLO与一系列NLP任务进行比较,包括交叉同义词,单词类比和句子对齐。结果表明CJ-Glo在这些方法之间实现了最佳性能,并且在交叉任务中更稳定:此外,与CJ-Boc相比,CJ-Glo对参数的改变不太敏感。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号