首页> 外文学位 >Topic models and dynamic prediction models and their applications in document retrieval and healthcare.
【24h】

Topic models and dynamic prediction models and their applications in document retrieval and healthcare.

机译:主题模型和动态预测模型及其在文档检索和医疗保健中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Statistical Topic Models has been widely studied in Text Mining as an effective approach to extract latent topics from unstructured text documents. We present a robust and computationally efficient Hierarchical Bayesian model for effective topic correlation modeling Generalized Dirichlet distribution (GD). GD-LDA is effective to avoid over-fitting as the number of topics is increased. We provide results using Empirical Likelihood (EL) in 4 public datasets. We show the application of topic models in two different domains: 1)Information Retrieval, and 2)Dynamic Prediction Models applied in health care.;In Information Retrieval, we propose to leverage statistical topic modeling techniques in relevance feedback to incorporate a better estimate of context by including corpus level information about the document. We show results using the OHSUMED dataset for three different variants and obtain higher performance, up to 12.5% in Mean Average Precision (MAP).;Patients often search for information on the web about treatments and diseases after they are discharged from the hospital. However, searching for medical information on the web poses challenges due to related terms and synonyms for the same disease and treatment.;We present a method to retrieve healthcare related documents using the patient discharge document. We show that the proposed framework outperformed the winner of the retrieval CLEF eHealth 2013 Challenge by 68% in the MAP measure, and by 13% in NDCG. We present a method to estimate dynamically the probability of mortality inside the Intensive Care Unit (ICU) by combining heterogeneous data. We propose a method based on Generalized Linear Dynamic Models that models the probability of mortality as a latent state that evolves over time. This framework allows us to combine different types of features (lab results, vital signs readings, doctor and nurse notes, etc.) into a single state. We update this state each time new patient data is observed. We test our proposed approach using 15,000 Electronic Medical Records (EMRs) obtained from the MIMIC II public data set.;We expand this dynamic mortality estimation model in two forms. We estimate the probability that a patient is readmitted after he is discharged from the ICU and transferred to a lower level care unit. We also present a method to predict the failure of physiological subsystems from patients admitted to the ICU using heterogeneous data dynamically. We model the probability of failure in each subsystem as a latent state. Then, we estimate the probability of patient mortality as a combination of the estimated failure propensity for all subsystems. We propose a method of imputing missing values using the non-ignorable nature of the patient data. Experimental results show that our method outperforms other approaches in the literature in terms of AUC, sensitivity, and specificity. In addition, we show that the combination of different features (numerical and text) increases the prediction performance of the proposed approach.
机译:统计主题模型已经在文本挖掘中得到了广泛的研究,是一种从非结构化文本文档中提取潜在主题的有效方法。我们为有效的主题相关性建模广义Dirichlet分布(GD)提供了鲁棒且计算效率高的贝叶斯模型。随着主题数量的增加,GD-LDA可以有效避免过度拟合。我们在4个公共数据集中使用经验似然(EL)提供结果。我们展示了主题模型在两个不同领域中的应用:1)信息检索,以及2)在医疗保健中应用的动态预测模型。;在信息检索中,我们建议在相关性反馈中利用统计主题建模技术,以更好地评估通过包含有关文档的语料库级别信息来实现上下文。我们使用OHSUMED数据集显示了三个不同变体的结果,并获得了更高的性能,平均平均精度(MAP)高达12.5%。;患者出院后,经常在网上搜索有关治疗和疾病的信息。然而,由于相同疾病和治疗的相关术语和同义词,在网络上搜索医疗信息提出了挑战。我们提出了一种使用出院文件检索医疗保健相关文件的方法。我们显示,提出的框架在MAP措施方面胜过了CLEF eHealth 2013挑战赛的获胜者,在MAP措施上胜过了13%,在NDCG方面胜过了13%。我们提出了一种方法,通过结合异构数据来动态估计重症监护病房(ICU)内的死亡率。我们提出了一种基于广义线性动力学模型的方法,该方法将死亡率的概率建模为随时间演变的潜在状态。该框架使我们可以将不同类型的功能(实验室检查结果,生命体征读数,医生和护士记录等)组合到一个状态中。每当观察到新的患者数据时,我们都会更新此状态。我们使用从MIMIC II公共数据集中获得的15,000个电子病历(EMR)测试了我们提出的方法。我们将动态死亡率估算模型扩展为两种形式。我们估计患者从ICU出院并转移到较低级别的护理部门后重新入院的可能性。我们还提出了一种使用异构数据动态预测从ICU入院的患者体内生理子系统失败的方法。我们将每个子系统的故障概率建模为潜在状态。然后,我们将所有子系统的估计故障倾向作为估计的患者死亡率概率。我们提出了一种使用患者数据的不可忽略性来估算缺失值的方法。实验结果表明,在AUC,灵敏度和特异性方面,我们的方法优于文献中的其他方法。另外,我们表明,不同特征(数字和文本)的组合提高了所提出方法的预测性能。

著录项

  • 作者

    Caballero Barajas, Karla L.;

  • 作者单位

    University of California, Santa Cruz.;

  • 授予单位 University of California, Santa Cruz.;
  • 学科 Information technology.;Statistics.;Information science.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 175 p.
  • 总页数 175
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号