首页> 外文期刊>Journal of Theoretical and Applied Information Technology >DISTRIBUTED AND PROGRESSIVE FEATURE SELECTION ALGORITHM FOR HIGH DIMENSIONAL DATA: A MAP-REDUCE APPROACH
【24h】

DISTRIBUTED AND PROGRESSIVE FEATURE SELECTION ALGORITHM FOR HIGH DIMENSIONAL DATA: A MAP-REDUCE APPROACH

机译:高维数据的分布式渐进特征选择算法:一种映射简化方法

获取原文
           

摘要

Dimensionality reduction or feature selection is an essential pre-processing step to apply machine learning algorithm further on any data set. But at for medium dimensional datasets it is optional or on-demand requirement. But it is mandatory in high dimensional datasets. Its significance is increased to get the accurate and relevant output from machine learning algorithm. Most of the existing methods are divided into 2 types one is Dimensionality reduction and the other one is feature selection. There is very narrow gap between these two methods. Dimensionality reduction is more mathematical analysis with transformations and may or may not have same subset of features from original features. Feature selection is application of feature engineering and requires domain knowledge. But any algorithm applicable for high dimensional data requires more processing time and storage resources. We considered the processing time as basis for our problem statement and implemented a distributed algorithm for Feature Selection and named as Distributed Progressive Feature selection algorithm with Knn+Relieff for high dimensional data. In this paper applied MapReduce concept to select final sub set of relevant features in progressive manner. Simulation results showthe feature with its weights for various parameters.
机译:降维或特征选择是将机器学习算法进一步应用于任何数据集的必要预处理步骤。但是对于中等维度的数据集,它是可选的或按需的。但这在高维数据集中是必需的。为了从机器学习算法中获得准确而相关的输出,其意义得到了提高。现有的大多数方法分为两种:一种是降维,另一种是特征选择。这两种方法之间的差距非常狭窄。降维是具有变换的更多数学分析,并且可能具有也可能不具有与原始特征相同的特征子集。特征选择是特征工程的应用,需要领域知识。但是任何适用于高维数据的算法都需要更多的处理时间和存储资源。我们将处理时间视为问题陈述的基础,并实现了用于特征选择的分布式算法,并针对高维数据将其命名为Knn + Relieff的分布式渐进特征选择算法。本文应用MapReduce概念以渐进方式选择相关特征的最终子集。仿真结果表明了该特征的权重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号