Ranking Interesting Subspaces for Clustering High Dimensional Data

机译：排序有趣的子空间以聚类高维数据

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Application domains such as life sciences, e.g. molecular biology produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because of the high dimensional, inherently sparse feature space of most real-world data sets. Nevertheless, the data sets often contain clusters hidden in various subspaces of the original feature space. We present a pre-processing step for traditional clustering algorithms, which detects all interesting sub-spaces of high-dimensional data containing clusters. For this purpose, we define a quality criterion for the interestingness of a subspace and propose an efficient algorithm called RIS (Ranking I nteresting Subspaces) to examine all such subspaces. A broad evaluation based on synthetic and real-world data sets empirically shows that RIS is suitable to find all relevant subspaces in large, high dimensional, sparse data and to rank them accordingly.

机译：生命科学等应用领域，例如分子生物学产生了大量的数据，如果没有有效而有效的数据挖掘方法的帮助，这些数据将无法进行管理。数据挖掘的主要任务之一是群集。但是，由于大多数实际数据集的高维，固有稀疏的特征空间，传统的聚类算法通常无法检测到有意义的聚类。但是，数据集通常包含隐藏在原始特征空间的各个子空间中的聚类。我们提出了传统聚类算法的预处理步骤，该步骤可检测包含聚类的高维数据的所有有趣子空间。为此，我们定义了子空间的趣味性的质量标准，并提出了一种有效的算法，称为RIS（排行子空间），以检查所有此类子空间。基于综合数据和真实数据集的广泛评估凭经验表明，RIS适合在大型，高维，稀疏数据中找到所有相关子空间，并对其进行相应排名。

著录项

来源
《7th European Conference on Principles and Practice of Knowledge Discovery in Databases; Sep 22-26, 2003; Cavtat-Dubrovnik, Croatia》|2003年|p.241-252|共12页
会议地点 Cavtat-Dubrovnik(HR);Cavtat-Dubrovnik(HR);Cavtat-Dubrovnik(HR);Cavtat-Dubrovnik(HR)
作者
Karin Kailing; Hans-Peter Kriegel; Peer Kroeger; Stefanie Wanka;
展开▼
作者单位

Institute for Computer Science University of Munich Oettingenstr. 67, 80538 Munich, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications (Advanced Science, Engineering and Medicine, Vol. 8(9), pp. 749–757 (2016)) [J] . Baghernia Ali, Pavin Hamid, Mirnabibaboli Miresmail, Advanced Science, Engineering and Medicine . 2017,第7期

机译：聚类高维数据流：生物信息学应用中预计集群的子空间聚类调查（高级科学，工程和医学，Vol.8（9），PP。749-757（2016））
2. ERRATUM: Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications [J] . Ali Baghernia, Hamid Pavin, Miresmail Mirnabibaboli, Advanced Science, Engineering and Medicine . 2017,第7期

机译：erratum：群集高维数据流：生物信息学应用中的子空间聚类调查，投影群集
3. Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications [J] . Ali Baghernia, Hamid Pavin, Miresmail Mirnabibaboli, Advanced Science, Engineering and Medicine . 2016,第9期

机译：聚类高维数据流：子空间聚类调查，生物信息学应用的预测聚类调查
4. Ranking Interesting Subspaces for Clustering High Dimensional Data [C] . Karin Kailing, Hans-Peter Kriegel, Peer Kroger, European Conference on Principles and Practice of Knowledge Discovery in Databases . 2003

机译：为聚类高维数据排名有趣的子空间
5. High-dimensional data mining: Subspace clustering, outlier detection and applications to classification. [D] . Foss, Andrew Philip Ogilvie. 2010

机译：高维数据挖掘：子空间聚类，离群值检测和分类应用。
6. Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data [O] . Burkhard Hoppenstedt, Manfred Reichert, Klaus Kammerer, 2019

机译：混合现实中的降维和子空间聚类用于高维生产数据的状态监测
7. Ranking Interesting Subspaces for Clustering High Dimensional Data [O] . Karin Kailing, Hans-Peter Kriegel, Peer Kroeger, 2003

机译：排列有趣的子空间来聚类高维数据

Ranking Interesting Subspaces for Clustering High Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅