Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

【24h】

Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

机译：识别最佳数据驱动特征选择方法，用于提高分类任务中的再现性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Considering the proliferation of extremely high-dimensional data in many domains including computer vision and healthcare applications such as computer-aided diagnosis (CAD), advanced techniques for reducing data dimensionality and identifying the most relevant features for a given classification task such as distinguishing between healthy and disordered brain states are needed. Despite the existence of many works on boosting the classification accuracy using a particular feature selection (FS) method, choosing the best one from a large pool of existing FS techniques for boosting feature reproducibility within a dataset of interest remains a formidable challenge to tackle. Notably, a good performance of a particular FS method does not necessarily imply that the experiment is reproducible and that the features identified are optimal for the entirety of the samples. Essentially, this paper presents the first attempt to address the following challenge: "Given a set of different feature selection methods {FS1,...,FSK}, and a dataset of interest, how to identify the most reproducible and 'trustworthy' connectomic features that would produce reliable biomarkers capable of accurately differentiate between two specific conditions?" To this aim, we propose FS-Select framework which explores the relationships among the different FS methods using a multi-graph architecture based on feature reproducibility power, average accuracy, and feature stability of each FS method. By extracting the 'central' graph node, we identify the most reliable and reproducible FS method for the target brain state classification task along with the most discriminative features fingerprinting these brain states. To evaluate the reproducibility power of FS-Select, we perturbed the training set by using different cross-validation strategies on a multi-view small-scale connectomic dataset (late mild cognitive impairment vs Alzheimer's disease) and large-scale dataset including autistic vs healthy subjects. Our experiments revealed reproducible connectional features fingerprinting disordered brain states. (C) 2020 Elsevier Ltd. All rights reserved.

机译：考虑到许多域中极高维数据的扩散，包括计算机视觉和医疗保健应用，如计算机辅助诊断（CAD），用于减少数据维度的高级技术，并识别给定分类任务的最相关的功能，例如区分健康的需要无序的脑状态。尽管存在许多有关使用特定特征选择（FS）方法的分类准确性的作品，但从大量现有的FS技术中选择最佳的一个用于升高的特征可再现性，仍然是一种强大的挑战来解决。值得注意的是，特定FS方法的良好性能并不一定意味着实验是可再现的，并且所识别的特征对于整个样本是最佳的。本质上，本文提出了第一次解决以下挑战的尝试：“给定一组不同的特征选择方法{FS1，...，FSK}和感兴趣的数据集，如何识别最重复和”值得信赖“的Connectomic能够产生可靠的生物标志物，能够在两个特定条件之间准确区分？“为此目的，我们提出了FS-SELECT框架，该框架使用多图架构基于特征再现电源，平均精度和每个FS方法的特征稳定性来探讨不同的FS方法之间的关系。通过提取“中央”图节点，我们确定目标脑状态分类任务的最可靠和可重复的FS方法以及指纹这些脑状态的最多辨别特征。为了评估FS-SELECT的再现性强度，我们通过在多视图小规模Connectomic数据集上使用不同的交叉验证策略来扰乱培训集（晚期轻度认知障碍与阿尔茨海默病）和大规模数据集，包括自闭症与健康主题。我们的实验揭示了可重复的连接特征指纹紊乱的脑状态。（c）2020 elestvier有限公司保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2020年第2020期|共14页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Feature selection methods; Multi-graph topological analysis; Feature reproducibility; Biomarker discovery; Morphological brain network; Neurological disorders; Connectomics; Cross-validation;

机译：特征选择方法;多图拓扑分析;特征再现性;生物标志物发现;形态脑网络;神经系统障碍;Connectomics;交叉验证;

相似文献

外文文献
中文文献
专利

1. Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks [J] . Pattern Recognition: The Journal of the Pattern Recognition Society . 2020,第期

机译：识别最佳数据驱动特征选择方法，用于提高分类任务中的再现性
2. Assessment of feature selection and classification methods for recognizing motor imagery tasks from electroencephalographic signals [J] . Roberto Vega, Touqir Sajed, Kory Wallace Mathewson, Artificial Intelligence Research . 2017,第1期

机译：从脑电图信号识别电动机图像任务的特征选择和分类方法的评估
3. Performance enhancement of mental task classification using EEG signal: a study of multivariate feature selection methods [J] . Gupta Akshansh, Agrawal R. K., Kaur Baljeet Soft computing: A fusion of foundations, methodologies and applications . 2015,第10期

机译：使用EEG信号增强心理任务分类的性能：多元特征选择方法的研究
4. Feature Selection methods applied to Motor Imagery task classification [C] . Alimed Celecia Ramos, René González Hernández, Marley Vellasco IEEE Latin American Conference on Computational Intelligence . 2016

机译：特征选择方法应用于汽车影像任务分类
5. Classification and variable selection for high dimensional multivariate binary data: Adaboost based new methods and a theory for the plug-in rule. [D] . Park, Junyong. 2006

机译：高维多元二进制数据的分类和变量选择：基于Adaboost的新方法和插件规则的理论。
6. Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes [O] . Thanyaluk Jirapech-Umpai, Stuart Aitken 2005

机译：微阵列数据分析的特征选择和分类：鉴定预测基因的进化方法
7. Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks [O] . Nicolas Georges, Islem Mhiri, Islem Rekik 2020

机译：识别最佳数据驱动功能选择方法，用于提高分类任务中的再现性

Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

摘要

著录项

相似文献

相关主题

期刊订阅