首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall
【2h】

Efficient chemical-disease identification and relationship extraction using Wikipedia to improve recall

机译:使用Wikipedia进行有效的化学疾病识别和关系提取以提高召回率

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Awareness of the adverse effects of chemicals is important in biomedical research and healthcare. Text mining can allow timely and low-cost extraction of this knowledge from the biomedical literature. We extended our text mining solution, LeadMine, to identify diseases and chemical-induced disease relationships (CIDs). LeadMine is a dictionary/grammar-based entity recognizer and was used to recognize and normalize both chemicals and diseases to Medical Subject Headings (MeSH) IDs. The disease lexicon was obtained from three sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon. Composite entities (e.g. heart and lung disease) were detected and mapped to their composite MeSH IDs. For CIDs, we developed a simple pattern-based system to find relationships within the same sentence. Our system was evaluated in the BioCreative V Chemical–Disease Relation task and achieved very good results for both disease concept ID recognition (F1-score: 86.12%) and CIDs (F1-score: 52.20%) on the test set. As our system was over an order of magnitude faster than other solutions evaluated on the task, we were able to apply the same system to the entirety of MEDLINE allowing us to extract a collection of over 250 000 distinct CIDs.
机译:意识到化学药品的不利影响在生物医学研究和医疗保健中很重要。文本挖掘可以从生物医学文献中及时低成本地提取这些知识。我们扩展了文本挖掘解决方案LeadMine,以识别疾病和化学诱导的疾病关系(CID)。 LeadMine是基于字典/语法的实体识别器,用于识别化学药品和疾病并将其标准化为医学主题词(MeSH)ID。该疾病词典是从以下三个来源获得的:MeSH,疾病本体论和维基百科。 Wikipedia词典是从带有疾病/症状框的页面或该页面标题出现在词典中的页面衍生而来的。检测到复合实体(例如心脏和肺部疾病)并将其映射到其复合MeSH ID。对于CID,我们开发了一个简单的基于模式的系统来查找同一句子内的关系。我们的系统在BioCreative V化学与疾病之间的关系任务中进行了评估,并且在测试集上的疾病概念ID识别(F1得分:86.12%)和CID(F1得分:52.20%)都取得了很好的结果。由于我们的系统比在该任务上评估的其他解决方案快一个数量级,因此我们能够将同一系统应用于整个MEDLINE,从而使我们能够提取超过25万个不同的CID。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号