首页> 美国卫生研究院文献>Journal of the American Medical Informatics Association : JAMIA >Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS
【2h】

Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

机译:基于机器学习和UMLS中概念关系的歧义词自动解决

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation. The UMLS has been used in natural language processing applications such as information retrieval and information extraction systems. The mapping of free-text to UMLS concepts is important for these applications. To improve the mapping, we need a method to disambiguate terms that possess multiple UMLS concepts. In the general English domain, machine-learning techniques have been applied to sense-tagged corpora, in which senses (or concepts) of ambiguous terms have been annotated (mostly manually). Sense disambiguation classifiers are then derived to determine senses (or concepts) of those ambiguous terms automatically. However, manual annotation of a corpus is an expensive task. We propose an automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts.>Methods. For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evaluated on a set of 35 frequently occurring ambiguous biomedical abbreviations using a gold standard set that was automatically derived. The quality of the derived sense-tagged corpus was measured using precision and recall.>Results. The derived sense-tagged corpus had an overall precision of 92.9% and an overall recall of 47.4%. After removing rare senses and ignoring abbreviations with closely related senses, the overall precision was 96.8% and the overall recall was 50.6%.>Conclusions. UMLS conceptual relations and MEDLINE abstracts can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to UMLS concepts.
机译:>动机。 UMLS已用于自然语言处理应用程序中,例如信息检索和信息提取系统。自由文本到UMLS概念的映射对于这些应用程序很重要。为了改善映射,我们需要一种方法来消除拥有多个UMLS概念的术语的歧义。在一般的英语领域中,机器学习技术已应用于带有常识标记的语料库,在该语料库中,模棱两可的术语的常识(或概念)已被注释(大部分是手动的)。然后派生出歧义歧义分类器,以自动确定这些歧义术语的涵义(或概念)。但是,手动注释语料库是一项昂贵的任务。我们提出了一种自动方法,该方法使用MEDLINE摘要为UMLS中的歧义词构造带有意义的语料库。>方法。对于表示多个UMLS概念的术语W,将提取包含W的MEDLINE摘要的集合。对于集合中的每个摘要,都会自动识别与UMLS中定义的与W有关系的概念的出现。然后,基于那些已识别的概念,导出标注了W的意义的带有意义的语料库。使用自动得出的金标准集,对一组35种频繁出现的歧义生物医学缩写进行了评估。 >结果。使用精度和召回率对衍生出的带有感官标记语料库的质量进行测量。派生的带有感官标记的语料库的总体精度为92.9%,总体召回率为47.4%。除去稀有感官并忽略与紧密相关的感官缩写后,总体准确性为96.8%,总体召回率为50.6%。>结论。 UMLS概念关系和MEDLINE摘要可用于在将自由文本映射到UMLS概念时自动获取解决歧义所需的知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号