首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks
【24h】

Identifying the best data-driven feature selection method for boosting reproducibility in classification tasks

机译:识别最佳数据驱动特征选择方法,用于提高分类任务中的再现性

获取原文
获取原文并翻译 | 示例
           

摘要

Considering the proliferation of extremely high-dimensional data in many domains including computer vision and healthcare applications such as computer-aided diagnosis (CAD), advanced techniques for reducing data dimensionality and identifying the most relevant features for a given classification task such as distinguishing between healthy and disordered brain states are needed. Despite the existence of many works on boosting the classification accuracy using a particular feature selection (FS) method, choosing the best one from a large pool of existing FS techniques for boosting feature reproducibility within a dataset of interest remains a formidable challenge to tackle. Notably, a good performance of a particular FS method does not necessarily imply that the experiment is reproducible and that the features identified are optimal for the entirety of the samples. Essentially, this paper presents the first attempt to address the following challenge: "Given a set of different feature selection methods {FS1,...,FSK}, and a dataset of interest, how to identify the most reproducible and 'trustworthy' connectomic features that would produce reliable biomarkers capable of accurately differentiate between two specific conditions?" To this aim, we propose FS-Select framework which explores the relationships among the different FS methods using a multi-graph architecture based on feature reproducibility power, average accuracy, and feature stability of each FS method. By extracting the 'central' graph node, we identify the most reliable and reproducible FS method for the target brain state classification task along with the most discriminative features fingerprinting these brain states. To evaluate the reproducibility power of FS-Select, we perturbed the training set by using different cross-validation strategies on a multi-view small-scale connectomic dataset (late mild cognitive impairment vs Alzheimer's disease) and large-scale dataset including autistic vs healthy subjects. Our experiments revealed reproducible connectional features fingerprinting disordered brain states. (C) 2020 Elsevier Ltd. All rights reserved.
机译:考虑到许多域中极高维数据的扩散,包括计算机视觉和医疗保健应用,如计算机辅助诊断(CAD),用于减少数据维度的高级技术,并识别给定分类任务的最相关的功能,例如区分健康的需要无序的脑状态。尽管存在许多有关使用特定特征选择(FS)方法的分类准确性的作品,但从大量现有的FS技术中选择最佳的一个用于升高的特征可再现性,仍然是一种强大的挑战来解决。值得注意的是,特定FS方法的良好性能并不一定意味着实验是可再现的,并且所识别的特征对于整个样本是最佳的。本质上,本文提出了第一次解决以下挑战的尝试:“给定一组不同的特征选择方法{FS1,...,FSK}和感兴趣的数据集,如何识别最重复和”值得信赖“的Connectomic能够产生可靠的生物标志物,能够在两个特定条件之间准确区分?“为此目的,我们提出了FS-SELECT框架,该框架使用多图架构基于特征再现电源,平均精度和每个FS方法的特征稳定性来探讨不同的FS方法之间的关系。通过提取“中央”图节点,我们确定目标脑状态分类任务的最可靠和可重复的FS方法以及指纹这些脑状态的最多辨别特征。为了评估FS-SELECT的再现性强度,我们通过在多视图小规模Connectomic数据集上使用不同的交叉验证策略来扰乱培训集(晚期轻度认知障碍与阿尔茨海默病)和大规模数据集,包括自闭症与健康主题。我们的实验揭示了可重复的连接特征指纹紊乱的脑状态。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号