【24h】

Ranking Interesting Subspaces for Clustering High Dimensional Data

机译:排序有趣的子空间以聚类高维数据

获取原文
获取原文并翻译 | 示例

摘要

Application domains such as life sciences, e.g. molecular biology produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensional, inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. We present a pre-processing step for traditional clustering algorithms, which detects all interesting sub-spaces of high-dimensional data containing clusters. For this purpose, we define a quality criterion for the interestingness of a subspace and propose an efficient algorithm called RIS (Ranking I nteresting Subspaces) to examine all such subspaces. A broad evaluation based on synthetic and real-world data sets empirically shows that RIS is suitable to find all relevant subspaces in large, high dimensional, sparse data and to rank them accordingly.
机译:生命科学等应用领域,例如分子生物学产生了大量的数据,如果没有有效而有效的数据挖掘方法的帮助,这些数据将无法进行管理。数据挖掘的主要任务之一是群集。但是,由于大多数实际数据集的高维,固有稀疏的特征空间,传统的聚类算法通常无法检测到有意义的聚类。但是,数据集通常包含隐藏在原始特征空间的各个子空间中的聚类。我们提出了传统聚类算法的预处理步骤,该步骤可检测包含聚类的高维数据的所有有趣子空间。为此,我们定义了子空间的趣味性的质量标准,并提出了一种有效的算法,称为RIS(排行子空间),以检查所有此类子空间。基于综合数据和真实数据集的广泛评估凭经验表明,RIS适合在大型,高维,稀疏数据中找到所有相关子空间,并对其进行相应排名。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号