...
首页> 外文期刊>Knowledge-Based Systems >Probabilistic data structures for big data analytics: A comprehensive review
【24h】

Probabilistic data structures for big data analytics: A comprehensive review

机译:大数据分析的概率数据结构:全面回顾

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

An exponential increase in the data generation resources is widely observed in last decade, because of evolution in technologies such as-cloud computing, IoT, social networking, etc. This enormous and unlimited growth of data has led to a paradigm shift in storage and retrieval patterns from traditional data structures to Probabilistic Data Structures (PDS). PDS are a group of data structures that are extremely useful for Big data and streaming applications in order to avoid high-latency analytical processes. These data structures use hash functions to compactly represent a set of items in stream-based computing while providing approximations with error bounds so that well-formed approximations get built into data collections directly. Compared to traditional data structures, PDS use much less memory and constant time in processing complex queries. This paper provides a detailed discussion of various issues which are normally encountered in massive data sets such as-storage, retrieval, query,etc. Further, role of PDS in solving these issues is also discussed where these data structures are used as temporary accumulators in query processing. Several variants of existing PDS along with their application areas have also been explored which give a holistic view of domains where these data structures can be applied for efficient storage and retrieval of massive data sets. Mathematical proofs of various parameters considered in the PDS have also been discussed in the paper. Moreover, the relative comparison of various PDS with respect to various parameters is also explored. (C) 2019 Elsevier B.V. All rights reserved.
机译:由于云计算,物联网,社交网络等技术的发展,近十年来数据生成资源呈指数级增长。这种巨大而无限的数据增长导致了存储和检索范式的转变。从传统数据结构到概率数据结构(PDS)的模式。 PDS是一组数据结构,它们对大数据和流应用程序非常有用,以避免高延迟的分析过程。这些数据结构使用哈希函数来紧凑地表示基于流的计算中的一组项目,同时提供具有误差范围的近似值,以便将格式正确的近似值直接构建到数据集合中。与传统数据结构相比,PDS在处理复杂查询时使用更少的内存和固定的时间。本文详细讨论了海量数据集中通常遇到的各种问题,例如存储,检索,查询等。此外,还讨论了PDS在解决这些问题中的作用,其中将这些数据结构用作查询处理中的临时累加器。还研究了现有PDS的几种变体及其应用领域,它们提供了域的整体视图,在这些域中这些数据结构可用于有效存储和检索海量数据集。本文还讨论了PDS中考虑的各种参数的数学证明。此外,还探讨了各种PDS相对于各种参数的相对比较。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号