首页> 外文会议>International workshop on complex networks and their applications >Is Community Detection Fully Unsupervised? The Case of Weighted Graphs
【24h】

Is Community Detection Fully Unsupervised? The Case of Weighted Graphs

机译:社区检测完全不受监督吗?加权图的情况

获取原文

摘要

In the field of NLP, word embeddings have recently attracted a lot of attention. A textual corpus is represented as a sparse words co-occurrences matrix. Then, the matrix can be factorized, for example using SVD, which allows to obtain a shorter matrix with dense and continuous vectors. To help SVD, PMI measure is applied on the initial co-occurrence matrix, assigning a relevant weight to the co-occurrences by normalizing them using both the considered words frequencies. In this paper, we follow this idea to study if weighted networks can benefit from pre-processing that can help community detection. We first design a benchmark using LFR networks. Then, we consider PMI and another NLP inspired measure as a preprocessing of the links weights, and show that PMI worsens the results while the other one improves them. By distinguishing links inside communities and links between communities into two classes, we show that this is due to the weights distributions of these links. Links between communities are in average bigger, leading to bigger values of PMI. From this analysis, we design another set of experiments that show that it is possible to classify efficiently links into these two classes, using a small set of features. Finally, we introduce the Supervised Label Propagation (SLP) algorithm that takes into account the classification results during the propagation. This algorithm clearly improves the results, leading us to a major questioning: is community detection on weighted networks a fully unsupervised task? We conclude with our thoughts on this topic.
机译:在NLP领域,Word Embeddings最近引起了很多关注。文本语料库表示为稀疏单词共同发生矩阵。然后,矩阵可以是分解的,例如使用SVD,其允许具有致密和连续向量的较短矩阵。为了帮助SVD,PMI测量应用于初始共发生矩阵,通过使用所考虑的单词频率对其进行归一化其来分配相关权重。在本文中,我们遵循这个想法要学习,如果加权网络可以从可以帮助社区检测的预处理中受益。我们首先使用LFR网络设计基准。然后,我们考虑PMI和另一个NLP激发措施作为链路权重的预处理,并显示PMI在另一个改善它们的同时使结果恶化。通过将社区内部的链接区分成两个类别,我们表明这是由于这些链接的权重分布。社区之间的链接平均大,导致PMI的更大值。从这个分析中,我们设计了另一组实验,表明可以使用一小组特征来分类为这两个类别的有效链接。最后,我们介绍了在传播期间考虑了分类结果的监督标签传播(SLP)算法。该算法清楚地提高了结果,导致我们对重大质询:是对加权网络的社区检测完全无监督的任务吗?我们凭借我们对这一主题的看法结束。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号