首页> 外文会议>IEEE EMBS International Conference on Biomedical and Health Informatics >Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports
【24h】

Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports

机译:关于癌症病理报告的自动信息提取的卷积神经网络粗致精细的多任务培训

获取原文

摘要

Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. In this study, we investigated an automated approach using a coarse-to-fine training of convolutional neural networks (CNNs) for extracting the primary site, histological grade and laterality from unstructured cancer pathology text reports. Our proposed training scheme consists of two stages. In the first stage, the multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. Then, the TM-CNN model parameters are used to initialize a CNN model for each task to be fine trained individually using its corresponding dataset. The performance of our proposed approach was compared against a state-of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the base line models, especially for the less prevalent classes. Specifically, the proposed training approach achieved a micro-F score of 0.7749 over 12 ICD-O-3 topography codes which is a significant improvement as compared with state-of-the-art CNN (0.7101) and the SVM (0.6019) classifiers. Also, the results demonstrate the potential of the proposed method for handling class imbalance within each task. It significantly improves macro-F score by 24% and 12% of the primary site and histology grade tasks, respectively.
机译:自由文本病理报告的信息提取和编码是癌症注册管理机构支持国家癌症监测的重要活动。癌症注册商必须每年处理大量的病理报告。在这项研究中,我们研究了一种使用对卷积神经网络(CNNS)的粗细训练来提取来自非结构化癌症病理学文本报告的主要部位,组织学等分和横向性的自动化方法。我们拟议的培训计划包括两个阶段。在第一阶段,具有硬参数共享方法的多任务学习(MTL)用于为所有任务培训多项任务MT-CNN模型。然后,TM-CNN模型参数用于初始化每个任务的CNN模型,以使用其相应的数据集单独训练。将我们提出的方法的性能与最先进的CNN和常用的SVM分类器进行比较。我们观察到所提出的模型始终如一地优于基线模型,特别是对于较少的普遍等级。具体地,所提出的培训方法实现了超过12个ICD-O-3形貌码的Micro-F得分为0.7749分,这是与最先进的CNN(0.7101)和SVM(0.6019)分类器相比的显着改善。此外,结果表明了在每个任务中处理类别不平衡的提议方法的潜力。它显着提高了宏-F分别通过24 %和12 %的主站点和组织学等级任务来提高宏。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号