...
首页> 外文期刊>International journal of medical informatics >The CLASSE GATOR (CLinical Acronym SenSE disambiGuATOR): A Method for predicting acronym sense from neonatal clinical notes
【24h】

The CLASSE GATOR (CLinical Acronym SenSE disambiGuATOR): A Method for predicting acronym sense from neonatal clinical notes

机译:CLASSE GATOR(临床首字母缩写词SenSE disambiGuATOR):一种从新生儿临床笔记中预测首字母缩写感的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Objective: To develop an algorithm for identifying acronym 'sense' from clinical notes without requiring a clinically annotated training set.Materials and Methods: Our algorithm is called CLASSE GATOR: Clinical Acronym SenSE disambiGuATOR. CLASSE GATOR extracts acronyms and definitions from PubMed Central (PMC). A logistic regression model is trained using words associated with specific acronym-definition pairs from PMC. CLASSE GATOR uses this library of acronym-definitions and their corresponding word feature vectors to predict the acronym 'sense' from Beth Israel Deaconess (MIMIC-III) neonatal notes.Results: We identified 1,257 acronyms and 8,287 definitions including a random definition from 31,764 PMC articles on prenatal exposures and 2,227,674 PMC open access articles. The average number of senses (definitions) per acronym was 6.6 (min = 2, max = 50). The average internal 5-fold cross validation was 87.9 % (on PMC). We found 727 unique acronyms (57.29 %) from PMC were present in 105,044 neonatal notes (MIMIC-III). We evaluated the performance of acronym prediction using 245 manually annotated clinical notes with 9 distinct acronyms. CLASSE GATOR achieved an overall accuracy of 63.04 % and outperformed random for 8/9 acronyms (88.89 %) when applied to clinical notes. We also compared our algorithm with UMN's acronym set, and found that CLASSE GATOR outperformed random for 63.46 % of 52 acronyms when using logistic regression, 75.00 % when using Bert and 76.92 % when using BioBert as the prediction algorithm within CLASSE GATOR.Conclusions: CLASSE GATOR is the first automated acronym sense disambiguation method for clinical notes. Importantly, CLASSE GATOR does not require an expensive manually annotated acronym-definition corpus for training.
机译:目的:开发一种无需临床注释训练集即可从临床笔记中识别首字母缩写“ sense”的算法。材料和方法:我们的算法称为CLASSE GATOR:临床首字母缩写SenSE disambiGuATOR。 CLASSE GATOR从PubMed Central(PMC)中提取首字母缩写词和定义。使用与来自PMC的特定首字母缩写词-定义对相关的词来训练逻辑回归模型。 CLASSE GATOR使用该首字母缩写词定义库及其相应的单词特征向量来预测Beth Israel Deaconess(MIMIC-III)新生儿笔记中的首字母缩写词“ sense”。结果:我们确定了1,257个首字母缩写词和8,287个定义,其中包括来自31,764 PMC的随机定义有关产前暴露的文章和2,227,674 PMC开放获取文章。每个首字母缩写词的平均感官(定义)数为6.6(最小= 2,最大= 50)。内部平均5倍交叉验证率为87.9%(在PMC上)。我们在105,044例新生儿笔记(MIMIC-III)中发现了727个来自PMC的独特首字母缩写词(57.29%)。我们使用245种带9个不同首字母缩写的手动注释临床笔记评估了首字母缩写预测的性能。当应用于临床笔记时,CLASSE GATOR的整体准确度达到63.04%,并且优于8/9的首字母缩写词(88.89%)的随机性。我们还将算法与UMN的首字母缩略词集进行了比较,发现CLASSE GATOR在使用Logistic回归时对52个首字母缩略词表现出了63.46%的随机性,在使用Bert的情况下使用CLASSE GATOR的预测算法的使用率为75.00%,在使用Bert的情况下为76.92%。 GATOR是首个针对临床笔记的自动首字母缩写词义消歧方法。重要的是,CLASSE GATOR不需要昂贵的人工注释首字母缩写定义语料就可以进行培训。

著录项

  • 来源
    《International journal of medical informatics》 |2020年第5期|104101.1-104101.10|共10页
  • 作者

  • 作者单位

    Univ Penn Dept Comp Sci Philadelphia PA 19104 USA;

    Childrens Hosp Philadelphia Dept Pediat Div Neonatol Philadelphia PA 19104 USA|Univ Penn Perelman Sch Med Philadelphia PA 19104 USA;

    Univ Penn Perelman Sch Med Dept Biostat Epidemiol & Informat Philadelphia PA 19104 USA|Univ Penn Inst Biomed Informat Philadelphia PA 19104 USA|Univ Penn Ctr Excellence Environm Toxicol Philadelphia PA 19104 USA|Childrens Hosp Philadelphia Dept Biomed & Hlth Informat Philadelphia PA 19104 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Electronic health records; Natural language processing; Secondary reuse; Transfer learning;

    机译:电子健康记录;自然语言处理;二次重用;转移学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号