...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics
【24h】

Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

机译:大数据分析的QoS感知近似查询评估的有效数据放置和复制

获取原文
获取原文并翻译 | 示例
           

摘要

Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.
机译:不同地理位置的企业用户会生成大量数据,这些数据存储在不同的地理数据中心。这些用户还可以对存储的数据执行大数据分析,以识别有价值的信息,从而做出战略决策。但是,众所周知,对位于地理位置的数据中心中的数据执行大数据分析通常是耗时且昂贵的。在某些对延迟敏感的应用程序中,如果回答查询花费的时间太长,则查询结果可能变得无用。取而代之的是,有时用户可能只对及时的近似结果而不是精确的查询结果感兴趣。在这种近似查询评估的情况下,应用程序必须牺牲及时性才能获得更准确的评估结果,或者必须承受通过分析数据样本以满足严格的时间表而获得的有保证的误差范围的评估结果。在本文中,我们研究了服务质量(QoS)感知的数据复制和放置,以对分布式云中的大数据分析进行近似查询评估,其中查询的原始(源)数据分布在不同的地理分布中数据中心。通过探索查询评估成本与评估结果误差范围之间的非平凡权衡,我们着重于将源数据的数据样本放置在某些战略数据中心上以满足用户严格的查询延迟要求的问题。我们首先针对单个近似查询提出一种具有可证明的近似比的近似算法。然后,我们开发了一种高效的启发式算法,用于评估一组近似查询,目的是在满足这些查询的延迟要求的同时,将评估成本降至最低。我们最终将通过实验仿真和在真实测试台上的实现来证明所提出算法的有效性和效率,并使用真实数据集。实验结果表明,该算法是有前途的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号