...
首页> 外文期刊>Journal of Biomedical Semantics >Ranking relations between diseases, drugs and genes for a curation task
【24h】

Ranking relations between diseases, drugs and genes for a curation task

机译:排序疾病,药物和基因之间的关系以完成管理任务

获取原文
           

摘要

Background One of the key pieces of information which biomedical text mining systems are expected to extract from the literature are interactions among different types of biomedical entities (proteins, genes, diseases, drugs, etc.). Several large resources of curated relations between biomedical entities are currently available, such as the Pharmacogenomics Knowledge Base (PharmGKB) or the Comparative Toxicogenomics Database (CTD). Biomedical text mining systems, and in particular those which deal with the extraction of relationships among entities, could make better use of the wealth of already curated material. Results We propose a simple and effective method based on logistic regression (also known as maximum entropy modeling) for an optimized ranking of relation candidates utilizing curated abstracts. Furthermore, we examine the effects and difficulties of using widely available metadata (i.e. MeSH terms and chemical substance index terms) for relation extraction. Cross-validation experiments result in an improvement of the ranking quality in terms of AUCiP/R by 39% (PharmGKB) and 116% (CTD) against a frequency-based baseline of 0.39 (PharmGKB) and 0.21 (CTD). For the TAP-10 metrics, we achieve an improvement of 53% (PharmGKB) and 134% (CTD) against the same baseline system (0.21 PharmGKB and 0.15 CTD). Conclusions Our experiments with the PharmGKB and the CTD database show a strong positive effect for the ranking of relation candidates utilizing the vast amount of curated relations covered by currently available knowledge databases. The tasks of concept identification and candidate relation generation profit from the adaptation to previously curated material. This presents an effective and practical method suitable for conservative extension and re-validation of biomedical relations from texts that has been successfully used for curation experiments with the PharmGKB and CTD database.
机译:背景技术期望生物医学文本挖掘系统从文献中提取的关键信息之一是不同类型的生物医学实体(蛋白质,基因,疾病,药物等)之间的相互作用。当前有许多生物医学实体之间确定关系的大型资源,例如药物基因组学知识库(PharmGKB)或比较毒物基因组数据库(CTD)。生物医学文本挖掘系统,尤其是那些处理实体之间关系提取的系统,可以更好地利用已整理的材料的财富。结果我们提出了一种基于逻辑回归(也称为最大熵建模)的简单有效的方法,用于利用策划摘要对关系候选者进行优化排名。此外,我们研究了使用广泛可用的元数据(即MeSH术语和化学物质索引术语)进行关系提取的影响和困难。交叉验证实验的结果是,相对于基于频率的基线0.39(PharmGKB)和0.21(CTD),AUCiP / R的排名质量提高了39%(PharmGKB)和116%(CTD)。对于TAP-10指标,相对于相同的基准系统(0.21 PharmGKB和0.15 CTD),我们分别提高了53%(PharmGKB)和134%(CTD)。结论我们使用PharmGKB和CTD数据库进行的实验表明,利用目前可用的知识数据库所覆盖的大量精选关系,对关系候选者的排名具有很强的积极作用。概念识别和候选关系生成的任务受益于对先前策划的材料的适应。这提供了一种适用于保守扩展和重新验证文本中生物医学关系的有效实用方法,该方法已成功用于PharmGKB和CTD数据库的策展实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号