首页> 外文OA文献 >An unsupervised approach to feature discretization and selection
【2h】

An unsupervised approach to feature discretization and selection

机译:特征离散化和选择的无监督方法

摘要

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
机译:许多学习问题需要使用较少数量的实例来处理高维数据集。因此,学习算法面临着维数的诅咒,因此必须加以解决才能有效。这些类型的数据的示例包括文本分类问题中的词袋表示以及用于肿瘤检测/分类的基因表达数据。通常,在表征实例的大量特征中,许多特征可能与学习任务无关(甚至有害)。因此,很明显,需要用于特征表示,缩小和选择的适当技术,以同时提高分类精度和存储要求。在本文中,我们提出了组合的无监督特征离散化和特征选择技术,适用于中,高维数据集。在具有稀疏和密集特征的几个标准数据集上的实验结果显示了所提出技术的效率以及对先前相关技术的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号