【24h】

Scattering-based Quality Measures

机译:基于散射的质量措施

获取原文

摘要

Various clustering algorithms use diverse settings, parameters, and initializations, generally result in different clustering solutions. Therefore, it is essential to compare and evaluate the clustering results and select the methods that best fits the “actual” data distribution. This can be achieved by using informative quality metrics that reflect the “goodness” of the resulting solutions compared to the ground truth. Different Extrinsic validation metrics have been provided in the literature, including F-measure, Entropy, Rand Index, and Purity. However, there is a gap in the literature in evaluating the level of divergence between multiple clusterings in an aggregate, especially in consensus clustering. In this paper, we propose three scattering measures that calculate the divergence level (i.e., scattering level) between two or more clustering algorithms. The proposed metrics are Scatter F-score, Scatter Entropy, and Scatter Purity. The proposed scattering measures are variants of the traditional F-measure, Entropy, and Purity quality measures. The scattering measures are used as pre-assessment criteria for deciding which clustering algorithms to combine in an aggregate. Experimental results on artificial, real, and text datasets show that the scattering measures play an important role in enhancing the clustering quality in consensus clustering and increasing the feasibility of the consensus.
机译:各种聚类算法使用不同的设置,参数和初始化,通常导致不同的聚类解决方案。因此,必须比较和评估聚类结果并选择最适合“实际”数据分布的方法。与地面真理相比,通过使用反映所产生解决方案的“善良”的信息质量指标可以实现这一点。文献中提供了不同的外在验证度量,包括F测量,熵,兰特指数和纯度。然而,文献中存在差距,评估聚合中的多个集群之间的分歧程度,特别是在共识聚类中。在本文中,我们提出了三种散射测量,该散射措施计算了两个或更多个聚类算法之间的发散水平(即散射水平)。拟议的指标是散射F分,散射熵和散射纯度。拟议的散射措施是传统F测量,熵和纯度质量措施的变体。散射措施用作预评估标准,用于确定哪个聚类算法在聚集体中结合。人工,真实和文本数据集的实验结果表明,散射措施在增强共识聚类中的聚类质量和增加共识的可行性方面发挥着重要作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号