首页> 外文期刊>Expert Systems with Application >Feature subset selection Filter-Wrapper based on low quality data
【24h】

Feature subset selection Filter-Wrapper based on low quality data

机译:基于低质量数据的特征子集选择Filter-Wrapper

获取原文
获取原文并翻译 | 示例
           

摘要

Today, feature selection is an active research in machine learning. The main idea of feature selection is to choose a subset of available features, by eliminating features with little or no predictive information, as well as redundant features that are strongly correlated. There are a lot of approaches for feature selection, but most of them can only work with crisp data. Until now there have not been many different approaches which can directly work with both crisp and low quality (imprecise and uncertain) data. That is why, we propose a new method of feature selection which can handle both crisp and low quality data. The proposed approach is based on a Fuzzy Random Forest and it integrates filter and wrapper methods into a sequential search procedure with improved classification accuracy of the features selected. This approach consists of the following main steps: (1) scaling and discretization process of the feature set; and feature pre-selection using the discretization process (filter); (2) ranking process of the feature pre-selection using the Fuzzy Decision Trees of a Fuzzy Random Forest ensemble; and (3) wrapper feature selection using a Fuzzy Random Forest ensemble based on cross-validation. The efficiency and effectiveness of this approach is proved through several experiments using both high dimensional and low quality datasets. The approach shows a good performance (not only classification accuracy, but also with respect to the number of features selected) and good behavior both with high dimensional datasets (microarray datasets) and with low quality datasets.
机译:如今,特征选择是机器学习中的一项活跃研究。特征选择的主要思想是通过消除具有很少或没有预测信息的特征以及高度相关的冗余特征来选择可用特征的子集。特征选择有很多方法,但是大多数方法只能使用清晰的数据。到目前为止,还没有许多不同的方法可以直接使用清晰和低质量(不精确和不确定)的数据。因此,我们提出了一种新的特征选择方法,该方法可以处理清晰和低质量的数据。所提出的方法基于模糊随机森林,并将过滤器和包装器方法集成到顺序搜索过程中,从而提高了所选特征的分类精度。该方法包括以下主要步骤:(1)特征集的缩放和离散化过程;以及使用离散化过程(滤波器)进行特征预选; (2)利用模糊随机森林集成的模糊决策树对特征预选进行排序。 (3)使用基于交叉验证的模糊随机森林集成来选择包装器特征。通过使用高维和低质量数据集的几次实验证明了这种方法的效率和有效性。该方法在高维数据集(微阵列数据集)和低质量数据集上均显示出良好的性能(不仅分类准确,而且还涉及所选特征的数量)和良好的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号