首页> 外文会议>IEEE Global Communications Conference >The Impact of Sampling on Big Data Analysis of Social Media: A Case Study on Flu and Ebola
【24h】

The Impact of Sampling on Big Data Analysis of Social Media: A Case Study on Flu and Ebola

机译:抽样对社会媒体大数据分析的影响 - 以流感和埃博拉病例为例

获取原文

摘要

The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.
机译:近年来在线社交网络的爆炸性增长已经产生了用户行为,社交图和内容中的大量数据集。鉴于这种大数据的规模,异质性和多样性,采样成为一种简单而直观的方法,可以减少在线社交网络中收集,测量和理解用户,行为和流量的数据集的大小。在本文中,我们量化了随机抽样对具有Twitter流数据分析的影响,以Twitter流数据为例。此外,我们设计了不同的采样策略,包括社区采样和地层采样,并评估它们对广泛的在线社交网络行为特征的影响。我们的实验结果表明,社区采样对用户跨越多条发布的发布和转发图结构的影响最小,同时实现了随机和分层采样的类似数据缩减。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号