...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays
【24h】

A Distributed Feature Selection Algorithm Based on Distance Correlation with an Application to Microarrays

机译:基于距离相关的分布式特征选择算法及其在微阵列中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

DNA microarray datasets are characterized by a large number of features with very few samples, which is a typical cause of overfitting and poor generalization in the classification task. Here, we introduce a novel feature selection (FS) approach which employs the distance correlation (dCor) as a criterion for evaluating the dependence of the class on a given feature subset. The dCor index provides a reliable dependence measure among random vectors of arbitrary dimension, without any assumption on their distribution. Moreover, it is sensitive to the presence of redundant terms. The proposed FS method is based on a probabilistic representation of the feature subset model, which is progressively refined by a repeated process of model extraction and evaluation. A key element of the approach is a distributed optimization scheme based on a vertical partitioning of the dataset, which alleviates the negative effects of its unbalanced dimensions. The proposed method has been tested on several microarray datasets, resulting in quite compact and accurate models obtained at a reasonable computational cost.
机译:DNA微阵列数据集的特点是具有大量特征,而样本却很少,这是分类任务过度拟合和泛化不佳的典型原因。在这里,我们介绍了一种新颖的特征选择(FS)方法,该方法采用距离相关(dCor)作为评估该类对给定特征子集的依赖性的标准。 dCor索引在任意维的随机向量之间提供了可靠的依赖性度量,而无需对其分布进行任何假设。而且,它对冗余术语的存在很敏感。所提出的FS方法基于特征子集模型的概率表示,并通过重复的模型提取和评估过程对其进行逐步完善。该方法的关键要素是基于数据集的垂直分区的分布式优化方案,可减轻其不平衡维度的负面影响。所提出的方法已经在几个微阵列数据集上进行了测试,从而以合理的计算成本获得了非常紧凑和准确的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号