Parallel Suffix Array and Least Common Prefix for the GPU

机译：PPU的并行后缀数组和最少的常见前缀

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Suffix Array (SA) is a data structure formed by sorting the suffixes of a string into lexicographic order. SAs have been used in a variety of applications, most notably in pattern matching and Burrows-Wheeler Transform (BWT) based lossless data compression. SAs have also become the data structure of choice for many, if not all, string processing problems to which suffix tree methodology is applicable. Over the last two decades researchers have proposed many suffix array construction algorithm (SACAs). We do a systematic study of the main classes of SACAs with the intent of mapping them onto a data parallel architecture like the GPU. We conclude that skew algorithm [12], a linear time recursive algorithm, is the best candidate for GPUs as all its phases can be efficiently mapped to a data parallel hardware. Our OpenCL implementation of skew algorithm achieves a throughput of up to 25 MStrings/sec and a speedup of up to 34× and 5.8× over a single threaded CPU implementation using a discrete GPU and APU respectively. We also compare our OpenCL implementation against the fastest known CPU implementation based on induced copying and achieve a speedup of up to 3.7×. Using SA we construct BWT on GPU and achieve a speedup of 11× over the fastest known BWT on GPU. Suffix arrays are often augmented with the longest common prefix (LCP) information. We design a novel high-performance parallel algorithm for computing LCP on the GPU. Our GPU implementation of LCP achieves a speedup of up to 25× and 4.3× on discrete GPU and APU respectively.

机译：后缀阵列（SA）是通过将字符串的后缀分类为词典顺序而形成的数据结构。 SAS已被用于各种应用中，最值得注意的是，基于模式匹配和挖掘机轮车变换（BWT）的无损数据压缩。 SAS还成为许多，如果不是全部的选择的数据结构，则为后缀树方法的字符串处理问题。在过去的二十年中，研究人员提出了许多后缀阵列施工算法（SACAS）。我们对SACA的主要类别进行了系统的研究，目的是将它们映射到像GPU这样的数据并行架构上。我们得出结论，偏斜算法[12]，线性时间递归算法是GPU的最佳候选者，因为它的所有相位可以有效地映射到数据并行硬件。我们的OpenCL实现偏斜算法的实现可以使用分立的GPU和APU的单线CPU实现实现多达25辆/秒的吞吐量，最多34×和5.8倍。我们还基于诱导的复制，将OpenCL实现与最快的已知CPU实现进行比较，并达到高达3.7×的加速。使用SA在GPU上构建BWT并在GPU上最快的已知BWT实现11倍的加速。后缀阵列通常使用最长的常见前缀（LCP）信息来增强。我们设计了一种用于在GPU上计算LCP的新型高性能并行算法。我们的GPU实施LCP，分别在离散GPU和APU上实现了高达25倍和4.3倍的加速。

著录项

来源
《ACM SIGPLAN Symposium on Priciples and Practice of Parallel Programming》|2013年|320p|共9页
会议地点
作者
Mrinal Deo; Sean Keely;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.5-53;
关键词
Algorithms; Performance; Suffix Array; Suffix Tree; BWT; Longest Common Prefix; OpenCL; GPU; Prefix Sum; Accelerated Processing Unit (APU);

机译：算法;性能;后缀数组;后缀树;BWT;最长的常见前缀;OpenCL;GPU;前缀总和;加速处理单元（APU）;

相似文献

外文文献
中文文献
专利

1. Parallel Suffix Array and Least Common Prefix for the GPU [J] . Mrinal Deo, Sean Keely ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2013,第8期

机译：GPU的并行后缀数组和最少通用前缀
2. Fast parallel skew and prefix-doubling suffix array construction on the GPU [J] . Wang Leyuan, Baxter Sean, Owens John D. Concurrency and computation: practice and experience . 2016,第12期

机译：在GPU上快速并行倾斜和前缀加倍的后缀数组构造
3. An Optimal Parallel Prefix-sums Algorithm on the Memory Machine Models for GPUs [J] . Koji NAKANO 電子情報通信学会技術研究報告 . 2012,第173期

机译：GPU内存机器模型上的最佳并行前缀和算法
4. Parallel Suffix Array and Least Common Prefix for the GPU [C] . Mrinal Deo, Sean Keely ACM SIGPLAN Symposium on Priciples and Practice of Parallel Programming . 2013

机译：PPU的并行后缀数组和最少的常见前缀
5. Fast Parallel Suffix Array on the GPU. [D] . Wang, Leyuan. 2015

机译：GPU上的快速并行后缀数组。
6. Parallel imaging performance investigation of an 8-channel common-mode differential-mode (CMDM) planar array for 7T MRI [O] . Xiaoqing Hu, Xiao Chen, Xin Liu, 2014

机译：用于7T MRI的8通道共模差模（CMDM）平面阵列的并行成像性能研究
7. Parallel distributed memory construction of suffix and longest common prefix arrays [O] . Patrick Flick, Srinivas Aluru 2015

机译：后缀和最长公共前缀阵列的并行分布式内存构造

Parallel Suffix Array and Least Common Prefix for the GPU

摘要

著录项

相似文献

相关主题

期刊订阅