首页> 外文学位 >Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques
【24h】

Improving the Autocoding of Injury Narratives Using a Combination of Machine Learning Methods and Natural Language Processing Techniques

机译:结合机器学习方法和自然语言处理技术来改进伤害性叙述的自动编码

获取原文
获取原文并翻译 | 示例

摘要

The field "external cause of injury code (E-code)" in injury datasets indicates the specific reason of an injury such as fall, cut, burn and electric shock. E-coded injury data is important for identifying the factors causing most serious injuries and prioritizing prevention efforts. E-codes are typically assigned to injury records by trained human coders based on the injury narrative -- a process that is expensive in terms of time and resources. Machine Learning (ML) models offer a promising alternative for quickly assigning E-codes (autocoding) based on the injury narrative but are not able to predict all categories with high accuracy. The primary reasons for low prediction accuracy include: large number of categories, poor quality of training data, heavily skewed distribution of data, and the sparse and noisy nature of injury narratives. Apart from data-related challenges, one of the fundamental reasons behind low autocoding accuracy of classical ML models is that these models use the bag-of-words approach that considers the statistical distribution of words in different categories but does not have knowledge of the syntax, semantics, and pragmatics of the narrative text.;Natural Language Processing (NLP) approaches can be used to extract deeper linguistic concepts from the narrative and supplement the ML models to improve autocoding performance. This study examined the use of "non-targeted" NLP approaches and proposed using "targeted" NLP approaches based on the causal model of E-codes for improving autocoding accuracy. Different methods of supplementing the ML model with causal concepts were examined: rule-based, narrative text transformation, and adding nodes in Bayes Network.;The non-targeted NLP approaches -- "Syntactic Tagging" and "Syntactic Tagging with Hypernym Mapping" used with Multinomial Naive Bayes (MNB) model resulted in lower prediction performance as compared to using plain narrative text. The targeted NLP approaches resulted in improved classification performance of the target category. For E-code "Electric Current", co-occurrence rules based on causal elements were able to identify cases with extremely high (98%) Positive Predictive Value (PPV) and improved the prediction performance of MNB, Support Vector Machine, and Logistic Regression models. The causal concept "Person Fell" was identified using syntactic parsing and word-sequence rules with extremely high PPV (92%), and embedding it to the narrative resulted in improved classification performance of FALL-related categories. Adding causal concepts as nodes in the Bayesian Network resulted in minor improvements in prediction performance.
机译:伤害数据集中的字段“伤害的外部原因代码(E-code)”指示受伤的具体原因,例如跌落,割伤,烧伤和电击。电子编码的伤害数据对于识别导致最严重伤害的因素并确定预防工作的优先级非常重要。通常,由受过训练的人类编码人员根据伤害的叙述将电子代码分配给伤害记录-这一过程在时间和资源上都很昂贵。机器学习(ML)模型提供了一种有前途的替代方法,可以根据伤害说明快速分配E代码(自动编码),但无法高精度预测所有类别。预测准确性低的主要原因包括:类别过多,训练数据质量低下,数据分布严重偏斜以及伤害说明的稀疏和嘈杂。除了与数据相关的挑战外,经典ML模型自动编码准确性低的根本原因之一是,这些模型使用词袋方法,该方法考虑了不同类别中词的统计分布,但不了解语法,叙事文本的语义和语用。自然语言处理(NLP)方法可用于从叙事中提取更深的语言概念,并补充ML模型以提高自动编码性能。这项研究检查了“非目标” NLP方法的使用,并提出了基于E码因果模型的“目标” NLP方法,以提高自动编码的准确性。研究了使用因果概念补充ML模型的不同方法:基于规则的,叙述性文本转换以及在Bayes网络中添加节点。非目标NLP方法-使用了“句法标记”和“带Hypernym映射的句法标记”与使用纯净叙述性文本相比,采用多项式朴素贝叶斯(MNB)模型的预测性能较低。有针对性的自然语言处理方法提高了目标类别的分类性能。对于E代码“电流”,基于因果关系的共现规则能够识别具有极高(98%)正预测值(PPV)的案例,并改善了MNB,支持向量机和Logistic回归的预测性能楷模。使用句法分析和单词顺序规则(PPV极高(92%))确定因果关系概念“ Person Fell”,并将其嵌入到叙述中可以改善FALL相关类别的分类性能。在贝叶斯网络中将因果概念添加为节点会导致预测性能稍有改善。

著录项

  • 作者

    Nanda, Gaurav.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Industrial engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 189 p.
  • 总页数 189
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号