A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

Song Qinbao; Ni Jingjie; Wang Guangtao

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

【24h】

A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

机译：基于快速聚类的高维数据特征子集选择算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers.

机译：特征选择涉及识别最有用特征的子集，该子集将产生兼容的结果作为原始的整个特征集。可以从效率和有效性的角度来评估特征选择算法。尽管效率与找到特征子集所需的时间有关，但有效性与特征子集的质量有关。基于这些标准，提出了一种基于聚类的快速特征选择算法（FAST），并对其进行了实验评估。 FAST算法分两个步骤工作。第一步，使用图论聚类方法将特征划分为聚类。在第二步中，从每个聚类中选择与目标类别密切相关的最具代表性的要素，以形成要素的子集。不同聚类中的特征是相对独立的，基于FAST的基于聚类的策略很可能产生有用且独立的特征子集。为了确保FAST的效率，我们采用了有效的最小生成树（MST）聚类方法。通过实证研究评估了FAST算法的效率和有效性。针对四种类型的知名分类器，即基于概率的朴素贝叶斯算法，进行了广泛的实验，以比较FAST和几种代表性的特征选择算法（即FCBF，ReliefF，CFS，Consist和FOCUS-SF）特征选择前后的基于树的C4.5，基于实例的IB1和基于规则的RIPPER。在35个可公开获得的现实世界中的高维图像，微阵列和文本数据上的结果表明，FAST不仅产生了较小的特征子集，而且还改善了四种分类器的性能。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2013年第1期|p.1-14|共14页
作者
Song Qinbao; Ni Jingjie; Wang Guangtao;
展开▼
作者单位

Xi'an Jiaotong University, Xi'an;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Feature subset selection; feature clustering; filter method; graph-based clustering;

机译：特征子集选择;特征聚类;过滤方法;基于图的聚类;

相似文献

外文文献
中文文献
专利

1. An Efficient Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data [J] . N.Magendiran, J.Jayaranjani International Journal of Innovative Research in Science, Engineering and Technology . 2014,第1期

机译：一种基于快速聚类的高维数据特征子集选择算法
2. An Efficient Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data [J] . N.Magendiran, J.Jayaranjani International Journal of Innovative Research in Science, Engineering and Technology . 2014,第1期

机译：一种基于快速聚类的高维数据特征子集选择算法
3. A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets [J] . Pablo Bermejo, Jose A. Gamez, Jose M. Puerta Pattern recognition letters . 2011,第5期

机译：高维数据集中用于快速混合（滤波器包装）特征子集选择的GRASP算法
4. Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data [C] . Smit Shilu, Kushal Sheth, Ekata Mehul International Conference on Information and Communication Technology for Sustainable Development . 2016

机译：基于快速聚类的特征子集选择算法的实现高维数据
5. Genetic Algorithms and Feature Subset Selection for Predicting Athletic Performance: Case of Professional Football. [D] . Cordes, Victor. 2016

机译：预测运动成绩的遗传算法和特征子集选择：以职业足球为例。
6. GARS: Genetic Algorithm for the identification of a Robust Subset of features in high-dimensional datasets [O] . Mattia Chiesa, Giada Maioli, Gualtiero I. Colombo, 2020

机译：GARS：遗传算法用于识别高维数据集中特征的鲁棒子集
7. Hybrid approaches to feature subset selection for data classification in high-dimensional feature space [O] . Maysa Ibrahem Almulla Khalaf, John Q Gan 2020

机译：HybrId方法来具有用于高维特征空间中数据分类的子集选择
8. Data Mining Feature Subset Weighting and Selection Using Genetic Algorithms [R] . 2002

机译：基于遗传算法的数据挖掘特征子集加权和选择

A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅