首页> 外文学位 >Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques

【24h】

Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques

机译：结合机器学习方法和自然语言处理技术来改进伤害性叙述的自动编码

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The field "external cause of injury code (E-code)" in injury datasets indicates the specific reason of an injury such as fall, cut, burn and electric shock. E-coded injury data is important for identifying the factors causing most serious injuries and prioritizing prevention efforts. E-codes are typically assigned to injury records by trained human coders based on the injury narrative -- a process that is expensive in terms of time and resources. Machine Learning (ML) models offer a promising alternative for quickly assigning E-codes (autocoding) based on the injury narrative but are not able to predict all categories with high accuracy. The primary reasons for low prediction accuracy include: large number of categories, poor quality of training data, heavily skewed distribution of data, and the sparse and noisy nature of injury narratives. Apart from data-related challenges, one of the fundamental reasons behind low autocoding accuracy of classical ML models is that these models use the bag-of-words approach that considers the statistical distribution of words in different categories but does not have knowledge of the syntax, semantics, and pragmatics of the narrative text.;Natural Language Processing (NLP) approaches can be used to extract deeper linguistic concepts from the narrative and supplement the ML models to improve autocoding performance. This study examined the use of "non-targeted" NLP approaches and proposed using "targeted" NLP approaches based on the causal model of E-codes for improving autocoding accuracy. Different methods of supplementing the ML model with causal concepts were examined: rule-based, narrative text transformation, and adding nodes in Bayes Network.;The non-targeted NLP approaches -- "Syntactic Tagging" and "Syntactic Tagging with Hypernym Mapping" used with Multinomial Naive Bayes (MNB) model resulted in lower prediction performance as compared to using plain narrative text. The targeted NLP approaches resulted in improved classification performance of the target category. For E-code "Electric Current", co-occurrence rules based on causal elements were able to identify cases with extremely high (98%) Positive Predictive Value (PPV) and improved the prediction performance of MNB, Support Vector Machine, and Logistic Regression models. The causal concept "Person Fell" was identified using syntactic parsing and word-sequence rules with extremely high PPV (92%), and embedding it to the narrative resulted in improved classification performance of FALL-related categories. Adding causal concepts as nodes in the Bayesian Network resulted in minor improvements in prediction performance.

机译：伤害数据集中的字段“伤害的外部原因代码（E-code）”指示受伤的具体原因，例如跌落，割伤，烧伤和电击。电子编码的伤害数据对于识别导致最严重伤害的因素并确定预防工作的优先级非常重要。通常，由受过训练的人类编码人员根据伤害的叙述将电子代码分配给伤害记录-这一过程在时间和资源上都很昂贵。机器学习（ML）模型提供了一种有前途的替代方法，可以根据伤害说明快速分配E代码（自动编码），但无法高精度预测所有类别。预测准确性低的主要原因包括：类别过多，训练数据质量低下，数据分布严重偏斜以及伤害说明的稀疏和嘈杂。除了与数据相关的挑战外，经典ML模型自动编码准确性低的根本原因之一是，这些模型使用词袋方法，该方法考虑了不同类别中词的统计分布，但不了解语法，叙事文本的语义和语用。自然语言处理（NLP）方法可用于从叙事中提取更深的语言概念，并补充ML模型以提高自动编码性能。这项研究检查了“非目标” NLP方法的使用，并提出了基于E码因果模型的“目标” NLP方法，以提高自动编码的准确性。研究了使用因果概念补充ML模型的不同方法：基于规则的，叙述性文本转换以及在Bayes网络中添加节点。非目标NLP方法-使用了“句法标记”和“带Hypernym映射的句法标记”与使用纯净叙述性文本相比，采用多项式朴素贝叶斯（MNB）模型的预测性能较低。有针对性的自然语言处理方法提高了目标类别的分类性能。对于E代码“电流”，基于因果关系的共现规则能够识别具有极高（98％）正预测值（PPV）的案例，并改善了MNB，支持向量机和Logistic回归的预测性能楷模。使用句法分析和单词顺序规则（PPV极高（92％））确定因果关系概念“ Person Fell”，并将其嵌入到叙述中可以改善FALL相关类别的分类性能。在贝叶斯网络中将因果概念添加为节点会导致预测性能稍有改善。

著录项

作者
Nanda, Gaurav.;
展开▼
作者单位

Purdue University.;

展开▼
授予单位 Purdue University.;
学科 Industrial engineering.
学位 Ph.D.
年度 2017
页码 189 p.
总页数 189
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques [J] . Biveeken Vijayakumar, Muhammad Marwan Muhammad Fuad Procedia Computer Science . 2019,第11期

机译：结合机器学习和自然语言处理技术识别短文作者的新方法
2. A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques [J] . Biveeken Vijayakumar, Muhammad Marwan Muhammad Fuad Procedia Computer Science . 2019,第1期

机译：结合机器学习和自然语言处理技术识别短文作者的新方法
3. Machine Vision Methods, Natural Language Processing, and Machine Learning Algorithms for Automated Dispersion Plot Analysis and Chemical Identification from Complex Mixtures [J] . Yeap Danny, Hichwa Paul T., Rajapakse Maneeshin Y., Analytical chemistry . 2019,第16期

机译：机器视觉方法，自然语言处理和机器学习算法，用于自动分散绘图分析和复杂混合物的化学识别
4. Mapping of Narrative Text Fields To ICD-10 Codes Using Natural Language Processing and Machine Learning [C] . Risuna Nkolele, Turgay Celik, Simphiwe Zitha Widening Natural Language Processing Workshop . 2020

机译：使用自然语言处理和机器学习将叙述文本字段映射到ICD-10代码
5. Enhancing Ontology Learning with Machine Learning and Natural Language Processing Techniques [D] . Liu, Yue. 2019

机译：加强机器学习和自然语言处理技术的本体学习
6. Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review [O] . Mustafa Khanbhai, Patrick Anyadi, Joshua Symons, 2021

机译：将自然语言处理和机器学习技术应用于患者体验反馈：系统审查
7. Classification of Sentimental Reviews Using Natural Language Processing Concepts and Machine Learning Techniques [O] . Agrawal Ankit 2015

机译：使用自然语言处理概念和机器学习技术对情感评论进行分类

Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques

摘要

著录项

相似文献

相关主题

期刊订阅