首页> 外文期刊>Pattern recognition letters >Feature weighting and selection with a Pareto-optimal trade-off between relevancy and redundancy
【24h】

Feature weighting and selection with a Pareto-optimal trade-off between relevancy and redundancy

机译:特征加权和选择,在相关性和冗余之间进行帕累托最优权衡

获取原文
获取原文并翻译 | 示例
           

摘要

Feature Selection (FS) is an important pre-processing step in machine learning and it reduces the number of features/variables used to describe each member of a dataset. Such reduction occurs by eliminating some of the non-discriminating and redundant features and selecting a subset of the existing features with higher discriminating power among various classes in the data. In this paper, we formulate the feature selection as a bi-objective optimization problem of some real-valued weights corresponding to each feature. A subset of the weighted features is thus selected as the best subset for subsequent classification of the data. Two information theoretic measures, known as 'relevancy' and 'redundancy' are chosen for designing the objective functions for a very competitive Multi-Objective Optimization (MOO) algorithm called 'Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D)'. We experimentally determine the best possible constraints on the weights to be optimized. We evaluate the proposed bi-objective feature selection and weighting framework on a set of 15 standard datasets by using the popular k-Nearest Neighbor (k-NN) classifier. As is evident from the experimental results, our method appears to be quite competitive to some of the state-of-the-art FS methods of current interest. We further demonstrate the effectiveness of our framework by changing the choices of the optimization scheme and the classifier to Non-dominated Sorting Genetic Algorithm (NSGA)-II and Support Vector Machines (SVMs) respectively. (C) 2017 Elsevier B.V. All rights reserved.
机译:特征选择(FS)是机器学习中重要的预处理步骤,它减少了用于描述数据集每个成员的特征/变量的数量。通过消除一些非歧视性和冗余性特征并在数据的各个类别之间选择具有较高区分能力的现有特征子集,可以实现这种减少。在本文中,我们将特征选择公式化为对应于每个特征的一些实值权重的双目标优化问题。因此,加权特征的子集被选择为用于数据的后续分类的最佳子集。选择了两种信息理论量度,分别称为“相关性”和“冗余度”,以设计非常竞争的多目标优化(MOO)算法的目标函数,该算法称为“基于分解的多目标进化算法(MOEA / D)”。我们通过实验确定了要优化的权重的最佳可能约束。通过使用流行的k最近邻(k-NN)分类器,我们对15个标准数据集上的拟议双目标特征选择和加权框架进行了评估。从实验结果可以明显看出,我们的方法似乎与当前关注的某些最新FS方法相比具有相当的竞争力。通过将优化方案和分类器的选择分别更改为非支配排序遗传算法(NSGA)-II和支持向量机(SVM),我们进一步证明了我们框架的有效性。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号