...
首页> 外文期刊>BMC Medical Informatics and Decision Making >A UMLS-based spell checker for natural language processing in vaccine safety
【24h】

A UMLS-based spell checker for natural language processing in vaccine safety

机译:基于UMLS的拼写检查器,用于疫苗安全中的自然语言处理

获取原文
           

摘要

Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
机译:背景医学研究所已将患者安全确定为美国医疗保健的主要目标。检测疫苗不良事件是一项重要的公共卫生活动,有助于患者安全。监视系统中有关免疫接种后不良事件的报告(AEFI)包含自由文本成分,可以使用自然语言处理对其进行分析。为了从自由文本中提取统一医学语言系统(UMLS)概念并根据其中包含的概念对AEFI报告进行分类,我们首先需要通过扩展缩写词和快捷方式以及纠正拼写错误来清理文本。本文的目的是创建一个基于UMLS的拼写错误纠正工具,作为AEFI报告的自然语言处理(NLP)管道的第一步。方法我们使用开源工具开发了拼写检查算法。我们使用已取消标识的AEFI监视报告来创建用于分析的自由文本数据集。在扩展了缩写的临床术语和快捷方式之后,我们分四个步骤执行了拼写校正:(1)错误检测,(2)单词列表生成,(3)单词列表消除歧义和(4)纠错。然后,我们通过将其与手动更正进行比较来测量所产生的拼写检查器的性能。结果我们使用了1,056个单词来训练拼写检查器,并测试了8,131个单词的性能。在测试过程中,拼写检查器的敏感性,特异性和阳性预测值(PPV)分别为74%(95%CI:74–75),100%(95%CI:100–100)和47%(95%CI :46%–48%)。结论我们创建了一个原型拼写检查器,可用于处理AEFI报告。我们使用UMLS专家词汇作为词典术语的主要来源,而WordNet词汇作为次要来源。我们使用UMLS作为特定领域词典词源,以比较语料库中可能拼错的单词。原型灵敏度可与目前可用的工具相媲美,但特异性要好得多。通过将其调整为最有用的组件算法,可以提高慢速处理速度。其他研究人员可能会发现我们开发的方法可用于使用针对他们感兴趣领域的词典来清理文本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号