首页> 外文会议>Proceedings of 2010 International Conference on Communication and Computational Intelligence >Cluster the unlabeled datasets using Extended Dark Block Extraction
【24h】

Cluster the unlabeled datasets using Extended Dark Block Extraction

机译:使用扩展暗块提取对未标记的数据集进行聚类

获取原文

摘要

Clustering analysis is the problem of partitioning a set of objects O = {o1... on} into c self-similar subsets based on available data. In general, clustering of unlabeled data possess three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into c meaningful groups, and 3) validating the c clusters that are discovered. We address the first problem, i.e., determining the number of clusters c prior to clustering. Many clustering algorithms require number of clusters as an input parameter, so the quality of the clusters mainly depends on this value. Most methods are post clustering measures of cluster validity i.e., they attempt to choose the best partition from a set of alternative partitions. In contrast, tendency assessment attempts to estimate c before clustering occurs. Here, we represent the structure of the unlabeled data sets as a Reordered Dissimilarity Image (RDI), where pair wise dissimilarity information about a data set including `n' objects is represented as nxn image. RDI is generated using VAT (Visual Assessment of Cluster tendency), RDI highlights potential clusters as a set of dark blocks along the diagonal of the image. So, number of clusters can be easily estimated using the number of dark blocks across the diagonal. We develop a new method called Extended Dark Block Extraction (EDBE) for counting the number of clusters formed along the diagonal of the RDI. EDBE method combines several image and signal processing techniques.
机译:聚类分析是基于可用数据将一组对象O = {o1 ... on}划分为c个自相似子集的问题。通常,未标记数据的聚类具有三个主要问题:1)评估聚类趋势,即要寻找多少个聚类? 2)将数据划分为c个有意义的组,并且3)验证发现的c个群集。我们解决第一个问题,即在聚类之前确定聚类的数量c。许多聚类算法要求将聚类数作为输入参数,因此聚类的质量主要取决于此值。大多数方法是群集有效性的群集后度量,即,它们尝试从一组替代分区中选择最佳分区。相反,趋势评估尝试在聚类发生之前估计c。在这里,我们将未标记数据集的结构表示为重新排序的相异性图像(RDI),其中有关包含“ n”个对象的数据集的成对相异性信息表示为nxn图像。 RDI是使用VAT(群集趋势的可视化评估)生成的,RDI将潜在的群集突出显示为沿图像对角线的一组暗块。因此,可以使用对角线上的暗块数量轻松估算群集的数量。我们开发了一种称为扩展暗块提取(EDBE)的新方法,用于计算沿RDI对角线形成的簇的数量。 EDBE方法结合了多种图像和信号处理技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号