A UMLS-based spell checker for natural language processing in vaccine safety

Herman D Tolentino; Michael D Matters; Wikke Walop; Barbara Law; Wesley Tong; Fang Liu; Paul Fontelo; Katrin Kohl; Daniel C Payne

首页> 外文期刊>BMC Medical Informatics and Decision Making >A UMLS-based spell checker for natural language processing in vaccine safety

【24h】

A UMLS-based spell checker for natural language processing in vaccine safety

机译：基于UMLS的拼写检查器，用于疫苗安全中的自然语言处理

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. Methods We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. Results We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. Conclusion We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.

机译：背景医学研究所已将患者安全确定为美国医疗保健的主要目标。检测疫苗不良事件是一项重要的公共卫生活动，有助于患者安全。监视系统中有关免疫接种后不良事件的报告（AEFI）包含自由文本成分，可以使用自然语言处理对其进行分析。为了从自由文本中提取统一医学语言系统（UMLS）概念并根据其中包含的概念对AEFI报告进行分类，我们首先需要通过扩展缩写词和快捷方式以及纠正拼写错误来清理文本。本文的目的是创建一个基于UMLS的拼写错误纠正工具，作为AEFI报告的自然语言处理（NLP）管道的第一步。方法我们使用开源工具开发了拼写检查算法。我们使用已取消标识的AEFI监视报告来创建用于分析的自由文本数据集。在扩展了缩写的临床术语和快捷方式之后，我们分四个步骤执行了拼写校正：（1）错误检测，（2）单词列表生成，（3）单词列表消除歧义和（4）纠错。然后，我们通过将其与手动更正进行比较来测量所产生的拼写检查器的性能。结果我们使用了1,056个单词来训练拼写检查器，并测试了8,131个单词的性能。在测试过程中，拼写检查器的敏感性，特异性和阳性预测值（PPV）分别为74％（95％CI：74–75），100％（95％CI：100–100）和47％（95％CI ：46％–48％）。结论我们创建了一个原型拼写检查器，可用于处理AEFI报告。我们使用UMLS专家词汇作为词典术语的主要来源，而WordNet词汇作为次要来源。我们使用UMLS作为特定领域词典词源，以比较语料库中可能拼错的单词。原型灵敏度可与目前可用的工具相媲美，但特异性要好得多。通过将其调整为最有用的组件算法，可以提高慢速处理速度。其他研究人员可能会发现我们开发的方法可用于使用针对他们感兴趣领域的词典来清理文本。

著录项

来源
《BMC Medical Informatics and Decision Making》 |2007年第1期|共页
作者
Herman D Tolentino; Michael D Matters; Wikke Walop; Barbara Law; Wesley Tong; Fang Liu; Paul Fontelo; Katrin Kohl; Daniel C Payne;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类医药、卫生;
关键词

相似文献

外文文献
中文文献
专利

1. The use of natural language processing to identify vaccine-related anaphylaxis at five health care systems in the Vaccine Safety Datalink [J] . Pharmacoepidemiology and drug safety . 2020,第2期

机译：使用自然语言处理在疫苗安全DataLink中的五个医疗系统中鉴定疫苗相关的过敏症
2. The use of natural language processing to identify Tdap-related local reactions at five health care systems in the Vaccine Safety Datalink [J] . Zheng Chengyi, Yu Wei, Xie Fagen, International journal of medical informatics . 2019,第JULa期

机译：在疫苗安全性数据链接中使用自然语言处理来识别五个卫生保健系统中与Tdap相关的局部反应
3. The use of natural language processing to identify Tdap-related local reactions at five health care systems in the Vaccine Safety Datalink [J] . Zheng Chengyi, Yu Wei, Xie Fagen, International journal of medical informatics . 2019,第Jula期

机译：使用自然语言处理在疫苗安全DataLink中的五个医疗保健系统中识别与TDAP相关的本地反应
4. Spatio-temporal Semantic Analysis of Safety Production Accidents in Grain Depot based on Natural Language Processing [C] . Xie Wang, Yun Cao, Bo Mao International Joint Conference on Web Intelligence and Intelligent Agent Technology . 2020

机译：基于自然语言处理的粮食仓库安全生产事故时空语义分析
5. Leveraging unstructured construction injury reports to predict safety outcomes and model safety risk using Natural Language Processing, Machine Learning, and probability theory [D] . Tixier, Antoine Jean-Pierre. 2015

机译：利用非结构化施工损伤报告以预测使用自然语言处理，机器学习和概率理论来预测安全结果和模型安全风险
6. A UMLS-based spell checker for natural language processing in vaccine safety [O] . Herman D Tolentino, Michael D Matters, Wikke Walop, 2007

机译：基于UMLS的拼写检查器用于疫苗安全中的自然语言处理
7. A UMLS-based spell checker for natural language processing in vaccine safety [O] . 2007

机译：基于UMLS的拼写检查器，用于疫苗安全中的自然语言处理

A UMLS-based spell checker for natural language processing in vaccine safety

摘要

著录项

相似文献

相关主题

期刊订阅