...
首页> 外文期刊>International Journal of Computer Systems Science & Engineering >Heavy hitter identification based on adaptive sampling with mapreduce
【24h】

Heavy hitter identification based on adaptive sampling with mapreduce

机译:基于MapReduce自适应采样的重击手识别

获取原文
获取原文并翻译 | 示例
           

摘要

As network bandwidth increases continually, identifying heavy hitters becomes more significant for some network applications, such as network management, network accounting and so on. MapReduce is a powerful tool for parallel and distributed processing of large-scale data. Unfortunately, the performance of MapReduce strongly depends on the even data distribution, while network traffic distribution has the heavy-tailed feature. The load imbalance, which is caused by the heavy-tailed feature, delays the finishing time of some tasks and significantly influences performance. In addition, existing methods to identify heavy hitters lack of scalability. In this paper, we propose the heavy hitter identification method based on adaptive sampling with MapReduce and it solves the problems of load imbalance. It introduces an adaptive sampling, which adjusts the sampling rate in the light of the updating counter value, to estimate the original distribution of flow size accurately by sampling a small fraction of network traffic. On the basis of the estimated distribution, we present the data partitioning strategy. Identifying heavy hitters is realized by the above data partitioning strategy during the reduce stage. It considers the impact of the network traffic change over time on the load balancing and the execution time. The experiments are conducted using real network traffic and the testing results illustrate that the proposed method can achieve better performance than the other two in terms of estimation accuracy, load balancing, scalability, data updating and the number of reducers.
机译:随着网络带宽的不断增加,对于某些网络应用程序(例如网络管理,网络记帐等),确定重击者变得更加重要。 MapReduce是用于并行和分布式处理大规模数据的强大工具。不幸的是,MapReduce的性能在很大程度上取决于均匀的数据分布,而网络流量分布具有重尾特征。重尾功能导致的负载不平衡会延迟某些任务的完成时间,并显着影响性能。另外,识别重击手的现有方法缺乏可伸缩性。本文提出了一种基于自适应采样的MapReduce重击球识别方法,解决了负荷不平衡的问题。它引入了自适应采样,可根据更新的计数器值调整采样率,以通过对一小部分网络流量进行采样来准确估算流量大小的原始分布。在估计分布的基础上,我们提出了数据分区策略。通过在缩减阶段使用上述数据分区策略,可以识别出重击手。它考虑了网络流量随时间变化对负载平衡和执行时间的影响。实验是在真实的网络流量下进行的,测试结果表明,该方法在估计精度,负载均衡,可扩展性,数据更新和Reducer数量等方面都比其他两种方法具有更好的性能。

著录项

  • 来源
  • 作者单位

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China School of Computer Science and Technology, Taizhou University, Taizhou 225300, China Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing 211189, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    MapReduce; network measurement; heavy hitter; sampling;

    机译:MapReduce;网络测量;重击手采样;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号