首页> 外文期刊>Acta diabetologica. >Can internet search engine queries be used to diagnose diabetes? Analysis of archival search data
【24h】

Can internet search engine queries be used to diagnose diabetes? Analysis of archival search data

机译:互联网搜索引擎查询可以用于诊断糖尿病吗? 分析档案搜索数据

获取原文
获取原文并翻译 | 示例
           

摘要

Aims Diabetes is often diagnosed late. This study aimed to assess the possibility for earlier detection of diabetes from search data, using predictive models trained on large-scale data. Methods We extracted all English-language queries made by people in the USA to Bing during 1 year and identified queries containing symptoms of diabetes. We compared the ability of four different prediction models (linear regression, logistic regression, decision tree and random forest) to distinguish between users who stated that they were diagnosed with diabetes and users who did not refer to diabetes or diabetes drugs but queried about at least one of the symptoms. Results We identified 11,050 "new diabetes users" who stated they had been diagnosed with diabetes and approximately 11.5 million "control users" who queried about symptoms without querying for terms related to diabetes. Both the logistic regression and the random forest models were able to distinguish between the populations with an area under curve of 0.92 which translates to a positive predictive value of 56% at a false-positive rate of 1%. The model could identify patients up to 240 days before they mentioned being diagnosed. Conclusions Some undiagnosed diabetes patients can be detected accurately according to their symptom queries to a search engine. Such earlier diagnosis, especially in cases of type 1 diabetes, could be clinically meaningful. The ability of search engines to serve as a population-wide screening tool could potentially be improved using additional data provided by users.
机译:AIMS糖尿病通常被诊断出来。本研究旨在评估早期检测搜索数据的糖尿病的可能性,使用在大规模数据上培训的预测模型。方法我们在1年内提取了美国人民的所有英语查询,并确定了患有糖尿病症状的疑问。我们比较了四种不同预测模型(线性回归,逻辑回归,决策树和随机林)的能力,以区分所述用户,这些用户认为它们被诊断出患有糖尿病和未提及糖尿病或糖尿病药物的用户,而是至少询问其中一个症状。结果我们确定了11,050名“新糖尿病用户”,他们被诊断出患有糖尿病和约1150万“控制用户”,他们在没有查询与糖尿病相关的术语的情况下询问症状。逻辑回归和随机森林模型都能够区分曲线的面积为0.92,这将以1%的假阳性率为56%的阳性预测值。该模型可以在被诊断出现之前识别患者最多240天。结论一些未确诊的糖尿病患者可以根据其症状查询准确地检测到搜索引擎。这种早期的诊断,特别是在1型糖尿病的情况下,可能是临床上有意义的。使用用户提供的附加数据可以提高搜索引擎作为宽屏蔽工具的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号