首页> 外文期刊>Engineering Applications of Artificial Intelligence >Pattern-based bootstrapping framework for biomedical relation extraction
【24h】

Pattern-based bootstrapping framework for biomedical relation extraction

机译:基于模式的生物医学关系提取的引导框架

获取原文
获取原文并翻译 | 示例
           

摘要

The progress made in the realm of '-omics' technologies has led to a tremendous increase in the quantum of biomedical research published. Information extraction from this huge unstructured mass of data needs automation through text mining methods. Biomedical relation extraction is one such vital automation processes for extracting biomedical relations hidden in scientific literature. In the recent past, several supervised machine learning methods have been used to identify biomedical relations. However, given the variations in textual expression, huge corpus size and small task-specific training data, semi-supervised techniques appear to perform better. To this end, we propose a system that uses the semi-supervised bootstrapping algorithm to extract biomedical relations from text. The unlabelled corpus used contains sentences with biomedical entities represented as patterns with the dependency tree feature. Bootstrapping starts with a seed set and iteratively learns new patterns from the unlabelled corpus. We have designed a three-level masking technique to generate new patterns, and incorporated three types of scoring to help select appropriate patterns. The pattern-based bootstrapping method performs well with a minimum seed set. The system is able to extract 37,450 patterns from the unlabelled corpus that represents different biomedical relations. These patterns, in turn, are able to identify 460,886 relation pairs with 1327 single, and 1012 coupled, trigger words that convey the semantics of the biomedical relation. More than 64% of the identified relations have evidence in the CTD database.
机译:“ - 域技术领域”的进展导致了发布的生物医学研究量大增加。从这种巨大的非结构化数据中提取信息通过文本挖掘方法来实现自动化。生物医学关系提取是提取隐藏在科学文学中的生物医学关系的重要自动化过程。在最近,已经使用了几种监督机器学习方法来识别生物医学关系。但是,鉴于文本表达的变化,巨大的语料库大小和小型任务特定的培训数据,半监督技术似乎更好。为此,我们提出了一个系统,该系统使用半监督的引导算法从文本中提取生物医学关系。未解压缩的语料库包含句子,其中包含具有依赖树功能的模式的生物医学实体。 Bootstrappe以种子集启动,并迭代地从未标记的语料库中了解新模式。我们设计了一种三级屏蔽技术来生成新模式,并入三种评分,以帮助选择适当的模式。基于模式的引导方法使用最小种子集进行良好。该系统能够从未标签的语料库中提取37,450个模式,该语料库代表不同的生物医学关系。这些模式又能够识别460,886个关系对,其中1327个单个和1012个耦合,触发词传达了生物医学关系的语义。超过64%的已识别关系在CTD数据库中有证据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号