...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Crowdclustering with Partition Labels
【24h】

Crowdclustering with Partition Labels

机译:使用分区标签进行集群

获取原文
           

摘要

Crowdclustering is a practical way to incorporate domain knowledge into clustering, by combining opinions from multiple domain experts. Existing crowdclustering methods analyze binary pairwise similarity labels. However, in some applications, experts might provide partition labels. If we convert partition labels into pairwise similarity, then it would be difficult to understand the relationships between clustering solutions from different experts. In this paper, we propose a crowdclustering model that directly analyzes partition labels. The proposed model adopts a novel approach based on a modified multinomial logistic regression model, which simultaneously learns the number of clusters and determines hyper-planes that partition samples into clusters. The proposed model also learns a mapping between the latent clusters and expert labels, revealing the agreements and disagreements between experts. Experiments on benchmark data demonstrate that the proposed model simultaneously learns the number of clusters and discovers the clustering structure. An experiment on disease subtyping problem illustrates that the proposed model helps us understand the agreement and disagreement between experts.
机译:拥挤集群是一种通过将多位领域专家的意见相结合而将领域知识纳入集群的实用方法。现有的人群聚类方法分析二进制成对相似性标签。但是,在某些应用程序中,专家可能会提供分区标签。如果我们将分区标签转换成成对的相似性,那么将很难理解来自不同专家的聚类解决方案之间的关系。在本文中,我们提出了一种直接分析分区标签的人群聚类模型。提出的模型采用一种基于改进的多项式逻辑回归模型的新颖方法,该模型同时学习聚类的数量并确定将样本划分为聚类的超平面。提出的模型还学习了潜在聚类和专家标签之间的映射,揭示了专家之间的协议和分歧。在基准数据上进行的实验表明,该模型可以同时学习聚类数量并发现聚类结构。对疾病亚型问题的实验表明,该模型有助于我们理解专家之间的共识和分歧。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号