...
首页> 外文期刊>Knowledge-Based Systems >Stacked ensemble coupled with feature selection for biomedical entity extraction
【24h】

Stacked ensemble coupled with feature selection for biomedical entity extraction

机译:堆叠集成与特征选择相结合,用于生物医学实体提取

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Entity extraction is one of the most fundamental and important tasks in biomedical information extraction. In this paper we propose a two-stage algorithm for the extraction of biomedical entities in the forms of genes and gene product mentions in text. Several different approaches have emerged but most of these state-of-the-art approaches suggest that individual system may not cover entity representations with arbitrary set of features and cannot achieve best performance. We identify and implement a diverse set of features which are relevant for the identification of biomedical entities and classification of them into some predefined categories. One most important criterion of these features is that these are identified and selected largely without using any domain knowledge. In the first stage we use a genetic algorithm (GA) based feature selection technique to determine the most relevant set of features for Support Vector Machine (SVM) and Conditional Random Field (CRF) classifiers. The CA based feature selection algorithm produces best population that can be used to generate different classification models based on CRF and SVM. In the second stage we develop a stacked based ensemble to combine the classifiers selected in the first stage. The proposed approach is evaluated on two benchmark datasets, namely JNLPBA 2004 shared task and GENETAG. The proposed approach yields the overall F-measure values of 75.17% and 94.70% for JNLPBA 2004 and GENETAG data sets, respectively.
机译:实体提取是生物医学信息提取中最基本,最重要的任务之一。在本文中,我们提出了一种两阶段算法,用于提取文本形式的基因和基因产物形式的生物医学实体。已经出现了几种不同的方法,但是大多数这些最新方法表明,单个系统可能无法覆盖具有任意功能集的实体表示,并且无法实现最佳性能。我们确定并实现了一系列与生物医学实体的识别以及将它们分类为一些预定义类别相关的功能。这些功能的最重要标准是无需使用任何领域知识就可以在很大程度上识别和选择这些功能。在第一阶段,我们使用基于遗传算法(GA)的特征选择技术来确定支持向量机(SVM)和条件随机场(CRF)分类器最相关的特征集。基于CA的特征选择算法可产生最佳种群,可用于基于CRF和SVM生成不同的分类模型。在第二阶段,我们将开发一个基于堆叠的集合,以结合在第一阶段中选择的分类器。在两个基准数据集(即JNLPBA 2004共享任务和GENETAG)上对提出的方法进行了评估。对于JNLPBA 2004和GENETAG数据集,所提出的方法的总F测量值分别为75.17%和94.70%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号