Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

Xia Qiufen; Xu Zichuan; Liang Weifa; Yu Shui; Guo Song; Zomaya Albert Y.

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

【24h】

Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

机译：大数据分析的QoS感知近似查询评估的有效数据放置和复制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Enterprise users at different geographic locations generate large-volume data that is stored at different geographic datacenters. These users may also perform big data analytics on the stored data to identify valuable information in order to make strategic decisions. However, it is well known that performing big data analytics on data in geographical-located datacenters usually is time-consuming and costly. In some delay-sensitive applications, the query result may become useless if answering a query takes too long time. Instead, sometimes users may only be interested in timely approximate rather than exact query results. When such approximate query evaluation is the case, applications must sacrifice timeliness to get more accurate evaluation results or tolerate evaluation result with a guaranteed error bound obtained from analyzing the samples of the data to meet their stringent timeline. In this paper, we study quality-of-service (QoS)-aware data replication and placement for approximate query evaluation of big data analytics in a distributed cloud, where the original (source) data of a query is distributed at different geo-distributed datacenters. We focus on the problems of placing data samples of the source data at some strategic datacenters to meet stringent query delay requirements of users, by exploring a non-trivial trade-off between the cost of query evaluation and the error bound of the evaluation result. We first propose an approximation algorithm with a provable approximation ratio for a single approximate query. We then develop an efficient heuristic algorithm for evaluating a set of approximate queries with the aim to minimize the evaluation cost while meeting the delay requirements of these queries. We finally demonstrate the effectiveness and efficiency of the proposed algorithms through both experimental simulations and implementations in a real test-bed, real datasets are employed. Experimental results show that the proposed algorithms are promising.

机译：不同地理位置的企业用户会生成大量数据，这些数据存储在不同的地理数据中心。这些用户还可以对存储的数据执行大数据分析，以识别有价值的信息，从而做出战略决策。但是，众所周知，对位于地理位置的数据中心中的数据执行大数据分析通常是耗时且昂贵的。在某些对延迟敏感的应用程序中，如果回答查询花费的时间太长，则查询结果可能变得无用。取而代之的是，有时用户可能只对及时的近似结果而不是精确的查询结果感兴趣。在这种近似查询评估的情况下，应用程序必须牺牲及时性才能获得更准确的评估结果，或者必须承受通过分析数据样本以满足严格的时间表而获得的有保证的误差范围的评估结果。在本文中，我们研究了服务质量（QoS）感知的数据复制和放置，以对分布式云中的大数据分析进行近似查询评估，其中查询的原始（源）数据分布在不同的地理分布中数据中心。通过探索查询评估成本与评估结果误差范围之间的非平凡权衡，我们着重于将源数据的数据样本放置在某些战略数据中心上以满足用户严格的查询延迟要求的问题。我们首先针对单个近似查询提出一种具有可证明的近似比的近似算法。然后，我们开发了一种高效的启发式算法，用于评估一组近似查询，目的是在满足这些查询的延迟要求的同时，将评估成本降至最低。我们最终将通过实验仿真和在真实测试台上的实现来证明所提出算法的有效性和效率，并使用真实数据集。实验结果表明，该算法是有前途的。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2019年第12期|2677-2691|共15页
作者
Xia Qiufen; Xu Zichuan; Liang Weifa; Yu Shui; Guo Song; Zomaya Albert Y.;
展开▼
作者单位

Dalian Univ Technol Key Lab Ubiquitous Network & Serv Software Liaoni Int Sch Informat Sci & Engn Dalian 116024 Liaoning Peoples R China;

Dalian Univ Technol Sch Software Dalian 116024 Liaoning Peoples R China;

Australian Natl Univ Res Sch Comp Sci Canberra ACT 2601 Australia;

Univ Technol Sydney Sch Software Ultimo NSW 2007 Australia;

Hong Kong Polytech Univ Dept Comp Hung Hom Hong Kong Peoples R China;

Univ Sydney Sch Comp Sci Camperdown NSW 2006 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big Data; Query processing; Delays; Approximation algorithms; Quality of service; Distributed databases; Software; Data replication and placement; big data analytics; approximate query evaluation; approximation algorithms; algorithm analysis;

机译：大数据;查询处理;延误;近似算法;服务质量;分布式数据库;软件;数据复制和放置;大数据分析;近似查询评估;近似算法;算法分析;

相似文献

外文文献
中文文献
专利

1. Cost-Efficient QoS-Aware Data Acquisition Point Placement for Advanced Metering Infrastructure [J] . Fariba Aalamifar, Lutz Lampe Communications, IEEE Transactions on . 2018,第12期

机译：用于高级计量基础架构的经济高效的QoS感知数据采集点放置
2. Efficiently processing deterministic approximate aggregation query on massive data [J] . Han Xixian, Wang Bailing, Li Jianzhong, Knowledge and information systems . 2018,第2期

机译：有效地处理大规模数据的确定性近似聚合查询
3. An efficient data placement for query-set-based broadcasting in mobile environments [J] . Guang-Ming Wu Computer Communications . 2007,第5期

机译：移动环境中基于查询集的广播的有效数据放置
4. QoS-aware data replications and placements for query evaluation of big data analytics [C] . Qiufen Xia, Weifa Liang, Zichuan Xu IEEE International Conference on Communications . 2017

机译：QoS感知的数据复制和放置，用于大数据分析的查询评估
5. Accelerating Analytical Query Processing with Data Placement Conscious Optimization and RDMA-Aware Query Execution [D] . Liu, Feilong. 2018

机译：通过数据放置意识优化和支持RDMA的查询执行来加速分析查询处理
6. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data [O] . Konstantinos Mamouras, Mukund Raghothaman, Rajeev Alur, -1

机译：StreamQRE：流数据上的定量查询的模块化规范和有效评估
7. QoS-Aware Approximate Query Processing for Smart Cities Spatial Data Streams [O] . Isam Mashhour Al Jawarneh, Paolo Bellavista, Antonio Corradi, 2021

机译：QoS感知智能城市空间数据流的近似查询处理

Efficient Data Placement and Replication for QoS-Aware Approximate Query Evaluation of Big Data Analytics

摘要

著录项

相似文献

相关主题

期刊订阅