首页> 外文会议>Australasian Language Technology Association workshop >Multilingual lexical resources to detect cognates in non-aligned texts
【24h】

Multilingual lexical resources to detect cognates in non-aligned texts

机译:多语言词汇资源可检测未对齐文本中的同源词

获取原文

摘要

The identification of cognates between two distinct languages has recently started to attract the attention of NLP research, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cognates within monolingual texts (texts that are not accompanied by aligned translated equivalents), by integrating word shape similarity approaches with word sense disambiguation techniques in order to account for context. Our implementation is based on BabelNet, a semantic network that incorporates a multilingual encyclopedic dictionary. Our approach is evaluated on two manually annotated da-tasets. The first one shows that across different types of natural text, our method can identify the cognates with an overall accuracy of 80%. The second one, consisting of control sentences with semi-cognates acting as either true cognates or false friends, shows that our method can identify 80% of semi-cognates acting as cognates but also identifies 75% of the semi-cognates acting as false friends.
机译:两种不同语言之间的同源词的识别近来已开始引起NLP研究的注意,但是很少有研究使用语义证据来检测同源词。本文提出的方法旨在通过将词形相似度​​方法与词义消歧技术结合起来以解决上下文问题,来检测单语文本(不包含对齐的翻译对等物的文本)中的英语-法语认知。我们的实现基于BabelNet,BabelNet是一个结合了多语言百科全书词典的语义网络。我们的方法是在两个手动注释的数据集上进行评估的。第一个表明,在不同类型的自然文本中,我们的方法可以以80%的整体准确度识别同义词。第二个句子由带有半认知角色的控制句子组成,这些半认知角色既是真实认知角色又是假朋友,表明我们的方法可以识别80%的半认知角色作为认知角色,但也可以识别75%的半认知角色作为虚假朋友。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号