Stacked ensemble coupled with feature selection for biomedical entity extraction

Asif Ekbal; Sriparna Saha

首页> 外文期刊>Knowledge-Based Systems >Stacked ensemble coupled with feature selection for biomedical entity extraction

【24h】

Stacked ensemble coupled with feature selection for biomedical entity extraction

机译：堆叠集成与特征选择相结合，用于生物医学实体提取

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity extraction is one of the most fundamental and important tasks in biomedical information extraction. In this paper we propose a two-stage algorithm for the extraction of biomedical entities in the forms of genes and gene product mentions in text. Several different approaches have emerged but most of these state-of-the-art approaches suggest that individual system may not cover entity representations with arbitrary set of features and cannot achieve best performance. We identify and implement a diverse set of features which are relevant for the identification of biomedical entities and classification of them into some predefined categories. One most important criterion of these features is that these are identified and selected largely without using any domain knowledge. In the first stage we use a genetic algorithm (GA) based feature selection technique to determine the most relevant set of features for Support Vector Machine (SVM) and Conditional Random Field (CRF) classifiers. The CA based feature selection algorithm produces best population that can be used to generate different classification models based on CRF and SVM. In the second stage we develop a stacked based ensemble to combine the classifiers selected in the first stage. The proposed approach is evaluated on two benchmark datasets, namely JNLPBA 2004 shared task and GENETAG. The proposed approach yields the overall F-measure values of 75.17% and 94.70% for JNLPBA 2004 and GENETAG data sets, respectively.

机译：实体提取是生物医学信息提取中最基本，最重要的任务之一。在本文中，我们提出了一种两阶段算法，用于提取文本形式的基因和基因产物形式的生物医学实体。已经出现了几种不同的方法，但是大多数这些最新方法表明，单个系统可能无法覆盖具有任意功能集的实体表示，并且无法实现最佳性能。我们确定并实现了一系列与生物医学实体的识别以及将它们分类为一些预定义类别相关的功能。这些功能的最重要标准是无需使用任何领域知识就可以在很大程度上识别和选择这些功能。在第一阶段，我们使用基于遗传算法（GA）的特征选择技术来确定支持向量机（SVM）和条件随机场（CRF）分类器最相关的特征集。基于CA的特征选择算法可产生最佳种群，可用于基于CRF和SVM生成不同的分类模型。在第二阶段，我们将开发一个基于堆叠的集合，以结合在第一阶段中选择的分类器。在两个基准数据集（即JNLPBA 2004共享任务和GENETAG）上对提出的方法进行了评估。对于JNLPBA 2004和GENETAG数据集，所提出的方法的总F测量值分别为75.17％和94.70％。

著录项

来源
《Knowledge-Based Systems》 |2013年第7期|22-32|共11页
作者
Asif Ekbal; Sriparna Saha;
展开▼
作者单位

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna 800 013, Bihar, India;

Department of Computer Science and Engineering, Indian Institute of Technology Patna, Patna 800 013, Bihar, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Biomedical entity extraction; GA based feature selection; Support Vector Machine (SVM); Conditional Random Field (CRF); Stack based Ensemble;

机译：生物医学实体提取;基于GA的特征选择;支持向量机（SVM）;条件随机场（CRF）;基于堆栈的乐团;

相似文献

外文文献
中文文献
专利

1. Information theoretic-PSO-based feature selection: an application in biomedical entity extraction [J] . Yadav Shweta, Ekbal Asif, Saha Sriparna Knowledge and information systems . 2019,第3期

机译：基于信息理论上的PSO的特征选择：生物医学实体提取的应用
2. Feature selection for entity extraction from multiple biomedical corpora: A PSO-based approach [J] . Shweta Yadav, Asif Ekbal, Sriparna Saha Soft computing: A fusion of foundations, methodologies and applications . 2018,第20期

机译：来自多种生物医学的实体提取的特征选择：基于PSO的方法
3. Feature selection techniques for maximum entropy based biomedical named entity recognition. [J] . Saha SK, Sarkar S, Mitra P Journal of biomedical informatics. . 2009,第5期

机译：基于最大熵的生物医学命名实体识别的特征选择技术。
4. Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection [C] . Shweta Yadav, Asif Ekbal, Sriparna Saha, Conference of the European Chapter of the Association for Computational Linguistics . 2017

机译：生物医学语料库中的实体提取：一种基于PSO的特征选择评估词嵌入特征的方法
5. Advancing Biomedical Named Entity Recognition with Multivariate Feature Selection and Semantically Motivated Features. [D] . Leaman, James Robert, Jr. 2013

机译：具有多元特征选择和语义动机特征的生物医学命名实体识别。
6. A Novel Feature Selection Strategy for Enhanced Biomedical Event Extraction Using the Turku System [O] . Jingbo Xia, Alex Chengyu Fang, Xing Zhang -1

机译：图尔库系统用于增强生物医学事件提取的新特征选择策略
7. Entity Extraction in Biomedical Corpora: An Approach to Evaluate Word Embedding Features with PSO based Feature Selection [O] . Shweta Yadav, Asif Ekbal, Sriparna Saha, 2017

机译：生物医学技术中的实体提取：一种评估基于PSO的特征选择词嵌入功能的方法

Stacked ensemble coupled with feature selection for biomedical entity extraction

摘要

著录项

相似文献

相关主题

期刊订阅