首页> 外文期刊>Procedia Computer Science >Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
【24h】

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding

机译:阿拉伯语利用阿拉伯词网和词嵌入的词义消歧

获取原文
           

摘要

Word Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are half a billion native Arabic speakers. In this work, we present multiple approaches for the problem of WSD in Arabic utilizing recent developments and successes in learning word embeddings with approaches such as GloVe, and Word2vec. The primary shortcoming of word embeddings is the single vector representation of a word’s meaning, although many words are polysemous. Our main contribution in this work is to computationally obtain an embedding for each sense, using an Arabic WordNet (AWN) to overcome the problem of WSD. We also compute word semantic similarity giving thought to multiple Arabic stemming algorithms. Finally, we make available a large pre-processed corpus that is ready to be used for further experiments and a WSD test data based on AWN,1seeking to fill gaps in Arabic NLP (ANLP) compared to English.
机译:词义消歧(WSD)是一项旨在根据给定上下文确定词义的任务。这个问题已经用英语进行了深入的调查和分析。但是,尽管有十亿以阿拉伯语为母语的人,阿拉伯语的工作却受到限制。在这项工作中,我们利用阿拉伯语中WSD问题的多种方法,利用最近的发展和在诸如GloVe和Word2vec之类的方法中学习单词嵌入的成功经验。单词嵌入的主要缺点是单词含义的单一矢量表示,尽管许多单词是多义的。我们在这项工作中的主要贡献是,使用阿拉伯语WordNet(AWN)来克服WSD的问题,从而在计算上获得每种意义的嵌入。我们还计算了单词语义相似度,从而考虑了多种阿拉伯词干算法。最后,我们提供了一个大型的预处理语料库,可用于进一步的实验,以及基于AWN的WSD测试数据,1旨在填补阿拉伯语NLP(ANLP)与英语相比的空白。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号