首页> 外文期刊>PLoS One >Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations
【24h】

Using machine learning for predicting cervical cancer from Swedish electronic health records by mining hierarchical representations

机译:采用机器学习通过采矿等级表示从瑞典电子健康记录预测宫颈癌

获取原文
           

摘要

Electronic health records (EHRs) contain rich documentation regarding disease symptoms and progression, but EHR data is challenging to use for diagnosis prediction due to its high dimensionality, relative scarcity, and substantial level of noise. We investigated how to best represent EHR data for predicting cervical cancer, a serious disease where early detection is beneficial for the outcome of treatment. A case group of 1321 patients with cervical cancer were matched to ten times as many controls, and for both groups several types of events were extracted from their EHRs. These events included clinical codes, lab results, and contents of free text notes retrieved using a LSTM neural network. Clinical events are described with great variation in EHR texts, leading to a very large feature space. Therefore, an event hierarchy inferred from the textual events was created to represent the clinical texts. Overall, the events extracted from free text notes contributed the most to the final prediction, and the hierarchy of textual events further improved performance. Four classifiers were evaluated for predicting a future cancer diagnosis where Random Forest achieved the best results with an AUC of 0.70 from a year before diagnosis up to 0.97 one day before diagnosis. We conclude that our approach is sound and had excellent discrimination at diagnosis, but only modest discrimination capacity before this point. Since our study objective was earlier disease prediction than such, we propose further work should consider extending patient histories through e.g. the integration of primary health records preceding referral to hospital.
机译:电子健康记录(EHRS)包含有关疾病症状和进展的丰富文档,但由于其高维度,相对稀缺和大量噪声水平,EHR数据用于诊断预测。我们调查了如何最好地代表预测宫颈癌的EHR数据,这是一种严重的疾病,早期检测对于治疗结果有益。一个宫颈癌患者的案例组与许多对照的十倍次,并且对于这两个组,从其EHR中提取了几种类型的事件。这些事件包括使用LSTM神经网络检索的临床代码,实验室结果和自由文本笔记的内容。临床事件描述了EHR文本的巨大变化,导致非常大的特征空间。因此,创建从文本事件推断的事件层次结构以表示临床文本。总的来说,从自由文本笔记中提取的事件为最终预测贡献了最大的预测,以及文本事件的层次结构进一步提高了性能。评估了四种分类剂,以预测未来癌症诊断,其中随机森林从一年前诊断前一年的AUC达到了最佳效果,每天诊断前一天达到0.97。我们得出结论,我们的方法是声音,在诊断中具有良好的歧视,但在这一点之前只有适度的歧视能力。由于我们的研究目标是早期的疾病预测,我们提出进一步的工作应考虑通过例如延伸患者历史。转介前往医院的初级健康记录的整合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号