WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

机译：WarpDrive：多GPU节点上的大规模并行散列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hash maps are among the most versatile data structures in computer science because of their compact data layout and expected constant time complexity for insertion and querying. However, associated memory access patterns during the probing phase are highly irregular resulting in strongly memory-bound implementations. Massively parallel accelerators such as CUDA-enabled GPUs may overcome this limitation by virtue of their fast video memory featuring almost one TB/s bandwidth in comparison to main memory modules of state-of-the-art CPUs with less than 100 GB/s. Unfortunately, the size of hash maps supported by existing single-GPU hashing implementations is restricted by the limited amount of available video RAM. Hence, hash map construction and querying that scales across multiple GPUs is urgently needed in order to support structured storage of bigger datasets at high speeds. In this paper, we introduce WarpDrive - a scalable, distributed single-node multi-GPU implementation for the construction and querying of billions of key-value pairs. We propose a novel subwarp-based probing scheme featuring coalesced memory access over consecutive memory regions in order to mitigate the high latency of irregular access patterns. Our implementation achieves 1.4 billion insertions per second in single-GPU mode for a load factor of 0.95 thereby outperforming the GPU-cuckoo implementation of the CUDPP library by a factor of 2.8 on a P100. Furthermore, we present transparent scaling to multiple GPUs within the same node with up to 4.3 billion operations per second for high load factors on four P100 GPUs connected by NVLink technology. WarpDrive is free software and can be downloaded at https://github.com/sleeepyjack/warpdrive.

机译：哈希地图是计算机科学中最通用的数据结构之一，因为它们具有紧凑的数据布局和预期的插入和查询的恒定时间复杂性。然而，探测阶段期间的相关存储器访问模式非常不规则，从而产生强存储器结合的实现。诸如CUDA的GPU的大规模并行加速器可以通过与少于100 GB / s的最先进CPU的主要内存模块相比，通过几乎具有几乎一个TB / S带宽来克服这种限制。遗憾的是，现有单GPU散列实现支持的哈希贴图的大小受到有限的可用视频RAM的限制。因此，哈希映射构造和查询多个GPU跨越多个GPU的尺度，以便以高速支持更大的数据集的结构化存储。在本文中，我们介绍了Warpdrive - 一种可扩展的分布式单节点多GPU实现，用于施工和查询数十亿个键值对。我们提出了一种基于新的基于子狼的探测方案，其具有连续存储区域的聚结的存储器访问，以减轻不规则访问模式的高延迟。我们的实施在单个GPU模式下实现了14亿个插入，用于负载系数为0.95，从而优于P100的GPU-Cuckoo在CUDPP库的GPU-CUCKOO实施。此外，我们在同一节点内向多个GPU呈现透明的缩放，对于通过NVLink技术连接的四个P100 GPU，每秒高达43亿次操作。 Warpdrive是免费软件，可以在https://github.com/sleeepyjack/warpdrive下载。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|441-450|共10页
会议地点
作者
Daniel Jünger; Christian Hundt; Bertil Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Computer science; Memory management; Data structures; Electronic mail; Layout; Bandwidth;

机译：图形处理单元;计算机科学;内存管理;数据结构;电子邮件;布局;带宽;

相似文献

外文文献
中文文献
专利

1. A Massively Parallel and Scalable Multi-GPU Material Point Method [J] . XINLEI WANG, YUXING QIU, STUART R. SLATTERY, ACM Transactions on Graphics . 2020,第4CD期

机译：一种大规模平行且可扩展的多GPU材料点方法
2. Tsunami: massively parallel homomorphic hashing on many-core GPUs [J] . Xiaowen Chu, Kaiyong Zhao, Zongpeng Li Concurrency and computation: practice and experience . 2012,第17期

机译：海啸：多核GPU上的大规模并行同态哈希
3. Massively parallel model matching: geometric hashing on the Connection Machine [J] . Rigoustos I., Hummel R. Computer . 1992,第2期

机译：大规模并行模型匹配：连接机上的几何哈希
4. WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes [C] . Daniel Jünger, Christian Hundt, Bertil Schmidt IEEE International Parallel and Distributed Processing Symposium . 2018

机译：WARPDRIVE：多GPU节点上大规模平行散列
5. Hashing Based Similarity Search over Massive Datasets [D] . Li, Jinfeng. 2018

机译：基于哈希的大规模数据集相似度搜索
6. Multi-GPU Based Parallel Design of the Ant Colony Optimization Algorithm for Endmember Extraction from Hyperspectral Images [O] . Jianwei Gao, Yi Sun, Bing Zhang, 2019

机译：基于多GPU的蚁群优化算法从高光谱图像中提取末端成员的并行设计
7. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters [O] . Jacobsen Dana A., Thibault Julien C., Senocak Inanc 2010

机译：MPI-CUDA在多GPU群集上大规模并行不可压缩流量计算的实现

WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes

摘要

著录项

相似文献

相关主题

期刊订阅