首页> 外文期刊>JMIR Medical Informatics >Increased Workload for Systematic Review Literature Searches of Diagnostic Tests Compared With Treatments: Challenges and Opportunities
【24h】

Increased Workload for Systematic Review Literature Searches of Diagnostic Tests Compared With Treatments: Challenges and Opportunities

机译:与治疗相比,诊断性检查的系统评价文献检索工作量增加:挑战与机遇

获取原文
           

摘要

Background Comprehensive literature searches are conducted over multiple medical databases in order to meet stringent quality standards for systematic reviews. These searches are often very laborious, with authors often manually screening thousands of articles. Information retrieval (IR) techniques have proven increasingly effective in improving the efficiency of this process. IR challenges for systematic reviews involve building classifiers using training data with very high class-imbalance, and meeting the requirement for near perfect recall on relevant studies. Traditionally, most systematic reviews have focused on questions relating to treatment. The last decade has seen a large increase in the number of systematic reviews of diagnostic test accuracy (DTA). Objective We aim to demonstrate that DTA reviews comprise an especially challenging subclass of systematic reviews with respect to the workload required for literature screening. We identify specific challenges for the application of IR to literature screening for DTA reviews, and identify potential directions for future research. Methods We hypothesize that IR for DTA reviews face three additional challenges, compared to systematic reviews of treatments. These include an increased class-imbalance, a broader definition of the target class, and relative inadequacy of available metadata (ie, medical subject headings (MeSH) terms for medical literature analysis and retrieval system online). Assuming these hypotheses to be true, we identify five manifestations when we compare literature searches of DTA versus treatment. These manifestations include: an increase in the average number of articles screened, and increase in the average number of full-text articles obtained, a decrease in the number of included studies as a percentage of full-text articles screened, a decrease in the number of included studies as a percentage of all articles screened, and a decrease in the number of full-text articles obtained as a percentage of all articles screened. As of July 12 2013, 13 published Cochrane DTA reviews were available and all were included. For each DTA review, we randomly selected 15 treatment reviews published by the corresponding Cochrane Review Group (N=195). We then statistically tested differences in these five hypotheses, for the DTA versus treatment reviews. Results Despite low statistical power caused by the small sample size for DTA reviews, strong ( P <.01) or very strong ( P <.001) evidence was obtained to support three of the five expected manifestations, with evidence for at least one manifestation of each hypothesis. The observed difference in effect sizes are substantial, demonstrating the practical difference in reviewer workload. Conclusions Reviewer workload (volume of citations screened) when screening literature for systematic reviews of DTA is especially high. This corresponds to greater rates of class-imbalance when training classifiers for automating literature screening for DTA reviews. Addressing concerns such as lower quality metadata and effectively modelling the broader target class could help to alleviate such challenges, providing possible directions for future research.
机译:背景技术为了满足严格的质量标准进行系统评价,需要在多个医学数据库中进行全面的文献检索。这些搜索通常非常费力,作者经常手动筛选数千篇文章。事实证明,信息检索(IR)技术在提高此过程的效率方面越来越有效。系统审查的IR挑战涉及使用分类失衡非常严重的训练数据来构建分类器,并满足相关研究的近乎完美召回的要求。传统上,大多数系统的评论都集中在与治疗有关的问题上。在过去的十年中,诊断测试准确性(DTA)的系统评价的数量大大增加。目的我们旨在证明就文献筛选所需的工作量而言,DTA评审包括系统评价的一个特别具有挑战性的子类。我们确定了将IR用于DTA审查的文献筛查的具体挑战,并确定了未来研究的潜在方向。方法我们假设与DTA的系统评价相比,DTA的IR面临另外三个挑战。这些包括类别不平衡的增加,目标类别的更广泛的定义以及可用元数据的相对不足(即,在线医学文献分析和检索系统的医学主题词(MeSH)术语)。假设这些假设是正确的,当我们比较DTA与治疗的文献检索结果时,我们确定了五种表现形式。这些表现包括:筛选文章的平均数量增加,获得的全文文章的平均数量增加,纳入研究的数量在筛选的全文文章中所占百分比的减少,数量的减少所占研究的百分比在所有被筛选文章中所占的百分比,所获得的全文文章数量在所筛选的所有文章中所占的百分比降低。截至2013年7月12日,已有13篇已发表的Cochrane DTA评论,其中包括所有评论。对于每个DTA审查,我们随机选择了相应Cochrane审查组(N = 195)发布的15项治疗评论。然后,我们针对DTA与治疗回顾对这五个假设的差异进行了统计学检验。结果尽管由于DTA审查的样本量小而导致统计能力低,但仍获得了有力的(P <.01)或非常有力的(P <.001)证据来支持五种预期表现中的三种,至少有一种表现为证据每个假设。观察到的效果大小差异很大,证明了审阅者工作量的实际差异。结论在筛选DTA的系统评价文献时,审稿人的工作量(筛查的引文量)特别高。当训练分类器以自动筛选DTA评论的文献时,这对应于更大的班级失衡率。解决诸如质量较低的元数据之类的问题并有效地对更广泛的目标类别进行建模可以帮助减轻此类挑战,为将来的研究提供可能的方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号