...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
【24h】

A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

机译:基于快速聚类的高维数据特征子集选择算法

获取原文
获取原文并翻译 | 示例
           

摘要

Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.
机译:特征选择涉及识别最有用特征的子集,该子集将产生兼容的结果作为原始的整个特征集。可以从效率和有效性的角度来评估特征选择算法。尽管效率与找到特征子集所需的时间有关,但有效性与特征子集的质量有关。基于这些标准,提出了一种基于聚类的快速特征选择算法(FAST),并对其进行了实验评估。 FAST算法分两个步骤工作。第一步,使用图论聚类方法将特征划分为聚类。在第二步中,从每个聚类中选择与目标类别密切相关的最具代表性的要素,以形成要素的子集。不同聚类中的特征是相对独立的,基于FAST的基于聚类的策略很可能产生有用且独立的特征子集。为了确保FAST的效率,我们采用了有效的最小生成树(MST)聚类方法。通过实证研究评估了FAST算法的效率和有效性。针对四种类型的知名分类器,即基于概率的朴素贝叶斯算法,进行了广泛的实验,以比较FAST和几种代表性的特征选择算法(即FCBF,ReliefF,CFS,Consist和FOCUS-SF)特征选择前后的基于树的C4.5,基于实例的IB1和基于规则的RIPPER。在35个可公开获得的现实世界中的高维图像,微阵列和文本数据上的结果表明,FAST不仅产生了较小的特征子集,而且还改善了四种分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号