首页> 外文会议>International workshop on complex networks and their applications >Analysis of the Web Graph Aggregated by Host and Pay-Level Domain
【24h】

Analysis of the Web Graph Aggregated by Host and Pay-Level Domain

机译:通过主机和支付级别域聚合的Web图分析

获取原文

摘要

In this paper the web is analyzed as a graph aggregated by host and pay-level domain (PLD). The web graph datasets, publicly available, have been released by the Common Crawl Foundation (http:// commoncrawl.org) and are based on a web crawl performed during the period May-June-July 2017. The host graph has ~1.3 billion nodes and ~5.3 billion arcs. The PLD graph has ~91 million nodes and ~1.1 billion arcs. We study the distributions of degree and sizes of strongly/weakly connected components (SCC/WCC) focusing on power laws detection using statistical methods. The statistical plausibility of the power law model is compared with that of several alternative distributions. While there is no evidence of power law tails on host level, they emerge on PLD aggregation for indegree, SCC and WCC size distributions. Finally, we analyze distance-related features by studying the cumulative distributions of the shortest path lengths, and give an estimation of the diameters of the graphs.
机译:在本文中,Web被分析为由主机和支付级别域(PLD)聚合的图形。公共爬网Foundation(http:// commoncrawl.org)发布了Web图形数据集,并以2017年6月至7月期间的时期执行的Web爬网。主机图有约13亿节点和〜53亿弧。 PLD图表有约91百万个节点和约11亿弧。我们研究了强烈/弱连接组件(SCC / WCC)的程度和尺寸分布,专注于使用统计方法检测电力法。电力法模型的统计合理性与几种替代分布的统计合理性进行了比较。虽然没有关于主机级的权力法尾部的证据,但它们出现了Indegree,SCC和WCC尺寸分布的PLD聚合。最后,我们通过研究最短路径长度的累积分布来分析距离相关特征,并估计图的直径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号