首页> 外文会议>SPIE Conference on Computer-Aided Diagnosis >Classifying abnormalities in computed tomography radiology reports with rule-based and natural language processing models
【24h】

Classifying abnormalities in computed tomography radiology reports with rule-based and natural language processing models

机译:基于规则和自然语言处理模型的计算断层扫描放射学报告的分类异常

获取原文

摘要

Purpose: When conducting machine learning algorithms on classification and detection of abnormalities for medical imaging, many researchers are faced with the problem that it is hard to get enough labeled data. This is especially difficult for modalities such as computed tomography (CT) with potentially 1000 or more slice images per case. To solve this problem, we plan to use machine learning algorithms to identify abnormalities within existing radiologist reports, thus creating case-level labels that may be used for weakly supervised training on the image data. We used a two-stage procedure to label the CT reports. In the first stage, a rule-based system labeled a smaller set of cases automatically with high accuracy. In the second stage, we developed machine learing algorithms using the labels from the rule-based system and word vectors learned without supervision from unlabeled CT reports. Method: In this study, we used approximately 24,000 CT reports from Duke University Health System. We initially focused on three organs, the lungs, liver/gallbladder, and kidneys. We first developed a rule-based system that can quickly identify certain types of abnormalities within CT reports with high accuracy. For each organ and disease combination, we produced several hundred cases with rule-based labels. These labels were combined with word vectors generated using word2vec from all the unlabeled reports to train two different machine learning algorithms: (a) average of word vectors merged by logistic regression, and (b) recurrent neural network (RNN). Result: Performance was evaluated by receiver operating characteristic (ROC) area under the curve (AUC) over an independent test set of 440 reports for which those organs were manually labeled as normal or abnormal by clinical experts. For lungs, the performance was 0.796 for average word vector and 0.827 for RNN. Liver performance was 0.683 for average word vector and 0.791 for RNN. For kidneys, it was 0.786 for average word
机译:目的:在对医学成像的分类和检测的分类和检测中进行机器学习算法时,许多研究人员面临着难以获得足够的标记数据的问题。这对于诸如计算机断层扫描(CT)之类的模态尤其困难,每个案例可能1000或更多的切片图像。为了解决这个问题,我们计划使用机器学习算法来识别现有放射科学家报告中的异常,从而创建可以用于图像数据对弱监督训练的案例级标签。我们使用了两阶段程序来标记CT报告。在第一阶段,基于规则的系统,标记为一组较小的案例,以高精度自动。在第二阶段,我们使用从基于规则的系统和字向量的标签开发了机器学习算法,并在没有未标记的CT报告的情况下学习的Word Vectors。方法:在本研究中,我们使用了Duke University Health系统的大约24,000 CT报告。我们最初专注于三个器官,肺部,肝脏/胆囊和肾脏。我们首先制定了一个基于规则的系统,可以在高精度中快速识别CT报告中的某些类型的异常。对于每个器官和疾病组合,我们生产了基于规则的标签的数百个案例。这些标签与使用Word2VEC从所有未标记的报告产生的字矢量相结合,以培训两种不同的机器学习算法:(a)由逻辑回归合并的字矢量的平均值,(b)复发性神经网络(RNN)。结果:在曲线(AUC)下的接收器操作特征(ROC)区域在440个报告的独立测试组中评估了性能,其中由临床专家手动标记为正常或异常的器官。对于肺部,平均词载体的性能为0.796,对于RNN为0.827。平均词载体的肝脏性能为0.683,对于RNN为0.791。对于肾脏,平均单词为0.786

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号