Multilingual lexical resources to detect cognates in non-aligned texts

机译：多语言词汇资源可检测未对齐文本中的同源词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The identification of cognates between two distinct languages has recently started to attract the attention of NLP research, but there has been little research into using semantic evidence to detect cognates. The approach presented in this paper aims to detect English-French cognates within monolingual texts (texts that are not accompanied by aligned translated equivalents), by integrating word shape similarity approaches with word sense disambiguation techniques in order to account for context. Our implementation is based on BabelNet, a semantic network that incorporates a multilingual encyclopedic dictionary. Our approach is evaluated on two manually annotated da-tasets. The first one shows that across different types of natural text, our method can identify the cognates with an overall accuracy of 80%. The second one, consisting of control sentences with semi-cognates acting as either true cognates or false friends, shows that our method can identify 80% of semi-cognates acting as cognates but also identifies 75% of the semi-cognates acting as false friends.

机译：两种不同语言之间的同源词的识别近来已开始引起NLP研究的注意，但是很少有研究使用语义证据来检测同源词。本文提出的方法旨在通过将词形相似度方法与词义消歧技术结合起来以解决上下文问题，来检测单语文本（不包含对齐的翻译对等物的文本）中的英语-法语认知。我们的实现基于BabelNet，BabelNet是一个结合了多语言百科全书词典的语义网络。我们的方法是在两个手动注释的数据集上进行评估的。第一个表明，在不同类型的自然文本中，我们的方法可以以80％的整体准确度识别同义词。第二个句子由带有半认知角色的控制句子组成，这些半认知角色既是真实认知角色又是假朋友，表明我们的方法可以识别80％的半认知角色作为认知角色，但也可以识别75％的半认知角色作为虚假朋友。

著录项

来源
《Australasian Language Technology Association workshop》|2014年|14-22|共9页
会议地点
作者
Haoxing Wang; Laurianne Sitbon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Evaluating and improving lexical resources for detecting signs of depression in text [J] . David E. Losada, Pablo Gamallo Language Resources and Evaluation . 2020,第1期

机译：评估和改进用于检测文本中抑郁迹象的词汇资源
2. Activating learning using multilingual CALL lexical resources: A regional culture-oriented multilingual visual dictionary project [J] . Janet M.D. Higgins Procedia - Social and Behavioral Sciences . 2012,第2期

机译：使用多语言CALL词汇资源激活学习：一个面向区域文化的多语言视觉词典项目
3. DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF [J] . Gilles Sérasset Semantic web . 2015,第4期

机译：DBnary：Wiktionary作为RDF中基于Lemon的多语言词汇资源
4. Multilingual lexical resources to detect cognates in non-aligned texts [C] . Haoxing Wang, Laurianne Sitbon Australasian Language Technology Association workshop . 2014

机译：多语言词汇资源，以检测不结盟文本中的同源
5. Automatically creating multilingual lexical resources [D] . Lam, Khang Nhut 2015

机译：自动创建多语言词汇资源
6. Using UMLS Lexical Resources to Disambiguate Abbreviations in Clinical Text [O] . Youngjun Kim, John Hurdle, Stéphane M. Meystre 2011

机译：使用UMLS词汇资源消除临床文本中的缩写歧义
7. Multilingual lexical resources to detect cognates in non-aligned texts [O] . Wang Haoxing, Sitbon Laurianne 2014

机译：多语言词汇资源可检测未对齐文本中的同源词

Multilingual lexical resources to detect cognates in non-aligned texts

摘要

著录项

相似文献

相关主题

期刊订阅