Probabilistic data structures for big data analytics: A comprehensive review

首页> 外文期刊>Knowledge-Based Systems >Probabilistic data structures for big data analytics: A comprehensive review

【24h】

Probabilistic data structures for big data analytics: A comprehensive review

机译：大数据分析的概率数据结构：全面回顾

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An exponential increase in the data generation resources is widely observed in last decade, because of evolution in technologies such as-cloud computing, IoT, social networking, etc. This enormous and unlimited growth of data has led to a paradigm shift in storage and retrieval patterns from traditional data structures to Probabilistic Data Structures (PDS). PDS are a group of data structures that are extremely useful for Big data and streaming applications in order to avoid high-latency analytical processes. These data structures use hash functions to compactly represent a set of items in stream-based computing while providing approximations with error bounds so that well-formed approximations get built into data collections directly. Compared to traditional data structures, PDS use much less memory and constant time in processing complex queries. This paper provides a detailed discussion of various issues which are normally encountered in massive data sets such as-storage, retrieval, query,etc. Further, role of PDS in solving these issues is also discussed where these data structures are used as temporary accumulators in query processing. Several variants of existing PDS along with their application areas have also been explored which give a holistic view of domains where these data structures can be applied for efficient storage and retrieval of massive data sets. Mathematical proofs of various parameters considered in the PDS have also been discussed in the paper. Moreover, the relative comparison of various PDS with respect to various parameters is also explored. (C) 2019 Elsevier B.V. All rights reserved.

机译：由于云计算，物联网，社交网络等技术的发展，近十年来数据生成资源呈指数级增长。这种巨大而无限的数据增长导致了存储和检索范式的转变。从传统数据结构到概率数据结构（PDS）的模式。 PDS是一组数据结构，它们对大数据和流应用程序非常有用，以避免高延迟的分析过程。这些数据结构使用哈希函数来紧凑地表示基于流的计算中的一组项目，同时提供具有误差范围的近似值，以便将格式正确的近似值直接构建到数据集合中。与传统数据结构相比，PDS在处理复杂查询时使用更少的内存和固定的时间。本文详细讨论了海量数据集中通常遇到的各种问题，例如存储，检索，查询等。此外，还讨论了PDS在解决这些问题中的作用，其中将这些数据结构用作查询处理中的临时累加器。还研究了现有PDS的几种变体及其应用领域，它们提供了域的整体视图，在这些域中这些数据结构可用于有效存储和检索海量数据集。本文还讨论了PDS中考虑的各种参数的数学证明。此外，还探讨了各种PDS相对于各种参数的相对比较。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2020年第5期|104987.1-104987.21|共21页
作者

展开▼
作者单位

Deemed Univ Thapar Inst Engn & Technol Comp Sci & Engn Dept Patiala Punjab India;

Univ Sydney Ctr Distributed & High Performance Comp Sydney NSW Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; Internet of things (IoT); Probabilistic data structures; Bloom filter; Quotient filter; Count min sketch; HyperLogLog counter; MM-hash; Locality sensitive hashing;

机译：大数据;物联网（IoT）;概率数据结构;布隆过滤器;商过滤器;数分钟草图;HyperLogLog计数器;MM-hash;局部敏感哈希;

相似文献

外文文献
中文文献
专利

1. Modern data science for analytical chemical data - A comprehensive review [J] . Szymanska Ewa Analytica chimica acta . 2018,第期

机译：分析化学数据的现代数据科学 - 全面审查
2. A Comprehensive Probabilistic Model of Traffic Loads based on Weigh-in-Motion Data for Applications to Bridge Structures [J] . Kim Jihwan, Song Junho KSCE journal of civil engineering . 2019,第8期

机译：基于运动称量数据的交通荷载综合概率模型在桥梁结构中的应用
3. A Comprehensive Probabilistic Model of Traffic Loads based on Weigh-in-Motion Data for Applications to Bridge Structures [J] . Kim Jihwan, Song Junho KSCE journal of civil engineering . 2019,第8期

机译：基于用于桥接结构的动作数据的交通负荷概况概况模型
4. A Comprehensive Review of Unstructured Data Management Approaches in Data Warehouse [C] . Gupta V., Gosain A. International Symposium on Computational and Business Intelligence . 2013

机译：数据仓库中非结构化数据管理方法的全面回顾
5. Strategies Big Data Analytics Specialists Need to Improve the Analytical Methods Used for Classifying Structured and Unstructured Data [D] . Ireland, Eric. 2020

机译：策略大数据分析专家需要改进用于分类结构化和非结构化数据的分析方法
6. Nutrient composition databases in the age of big data: foodDB a comprehensive real-time database infrastructure [O] . Richard Andrew Harrington, Vyas Adhikari, Mike Rayner, 2019

机译：大数据时代的营养成分数据库：foodDB全面的实时数据库基础架构
7. MOFA+: a probabilistic framework for comprehensive integration of structured single-cell data [O] . Ricard Argelaguet, Damien Arnol, Danila Bredikhin, 2019

机译：MOFA +：用于综合整合结构的单细胞数据的概率框架

Probabilistic data structures for big data analytics: A comprehensive review

摘要

著录项

相似文献

相关主题

期刊订阅