首页> 外文会议>International Conference on Innovative Computing Technology >Extraction of medical terms for word sense disambiguation within multilingual framework
【24h】

Extraction of medical terms for word sense disambiguation within multilingual framework

机译:在多语言框架内提取医学术语以消除单词歧义

获取原文

摘要

All the languages belonging to the same language family have a certain number of the common characteristics called language pair phenomena, which can be found quite useful for processing them for multilingual purposes like translation across the cognate languages, building dictionaries, thesauri, transcript collections, or for multilingual text retrieval of digital documents. In addition, it is estimated that more than 30% of English vocabulary has been inherited from Latin, which has dominated medical terminology in particular. We use this fact by exploring word sense disambiguation (WSD) in multilingual environment. Specifically in the medical domain, language pair phenomena can be limited to synonymy of the cognate technical terms. Our approach is investigated based on Boolean and Free Text Search modes on the comparison basis. For measuring the efficiency of our methodology we use the classical Salton model of tf-idf term weighting schemes, however extended by Karen Spärck Jones. Our results are very promising since they indicate that similarity between the synonymous words being English medical terms and their target language equivalents enables significant limitation of the target word senses even those outside the language family like e.g. for the English and Polish language pair phenomena. Such a limitation of the number of target word senses results in better disambiguation and is more context-driven. Also, consequently it translates onto the higher precision in multilingual medical information retrieval.
机译:属于同一语言族的所有语言都具有一定数量的共同特征,称为语言对现象,可以发现这些特征对于处理多语言目的非常有用,例如跨同类语言的翻译,构建词典,叙词表,抄本集合或用于数字文档的多语言文本检索。另外,据估计,超过30%的英语词汇来自拉丁语,尤其是医学术语占主导地位。我们通过在多语言环境中探索单词歧义消除(WSD)来使用此事实。特别是在医学领域,语言对现象可能仅限于相关技术术语的同义词。在比较的基础上,我们基于布尔和自由文本搜索模式对我们的方法进行了研究。为了衡量我们方法论的效率,我们使用tf-idf项加权方案的经典Salton模型,但由KarenSpärckJones进行了扩展。我们的结果是非常有希望的,因为它们表明作为英语医学术语的同义词与它们的目标语言等效词之间的相似性使目标词的感觉受到了极大的限制,即使是那些语言家族之外的人,例如例如用于英语和波兰语对的现象。目标词义数量的这种限制导致更好的歧义消除,并且更受上下文驱动。而且,因此,在多语言医学信息检索中,它可以转化为更高的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号