首页> 外文会议>ACM SIGPLAN Symposium on Priciples and Practice of Parallel Programming >Parallel Suffix Array and Least Common Prefix for the GPU
【24h】

Parallel Suffix Array and Least Common Prefix for the GPU

机译:PPU的并行后缀数组和最少的常见前缀

获取原文

摘要

Suffix Array (SA) is a data structure formed by sorting the suffixes of a string into lexicographic order. SAs have been used in a variety of applications, most notably in pattern matching and Burrows-Wheeler Transform (BWT) based lossless data compression. SAs have also become the data structure of choice for many, if not all, string processing problems to which suffix tree methodology is applicable. Over the last two decades researchers have proposed many suffix array construction algorithm (SACAs). We do a systematic study of the main classes of SACAs with the intent of mapping them onto a data parallel architecture like the GPU. We conclude that skew algorithm [12], a linear time recursive algorithm, is the best candidate for GPUs as all its phases can be efficiently mapped to a data parallel hardware. Our OpenCL implementation of skew algorithm achieves a throughput of up to 25 MStrings/sec and a speedup of up to 34× and 5.8× over a single threaded CPU implementation using a discrete GPU and APU respectively. We also compare our OpenCL implementation against the fastest known CPU implementation based on induced copying and achieve a speedup of up to 3.7×. Using SA we construct BWT on GPU and achieve a speedup of 11× over the fastest known BWT on GPU. Suffix arrays are often augmented with the longest common prefix (LCP) information. We design a novel high-performance parallel algorithm for computing LCP on the GPU. Our GPU implementation of LCP achieves a speedup of up to 25× and 4.3× on discrete GPU and APU respectively.
机译:后缀阵列(SA)是通过将字符串的后缀分类为词典顺序而形成的数据结构。 SAS已被用于各种应用中,最值得注意的是,基于模式匹配和挖掘机轮车变换(BWT)的无损数据压缩。 SAS还成为许多,如果不是全部的选择的数据结构,则为后缀树方法的字符串处理问题。在过去的二十年中,研究人员提出了许多后缀阵列施工算法(SACAS)。我们对SACA的主要类别进行了系统的研究,目的是将它们映射到像GPU这样的数据并行架构上。我们得出结论,偏斜算法[12],线性时间递归算法是GPU的最佳候选者,因为它的所有相位可以有效地映射到数据并行硬件。我们的OpenCL实现偏斜算法的实现可以使用分立的GPU和APU的单线CPU实现实现多达25辆/秒的吞吐量,最多34×和5.8倍。我们还基于诱导的复制,将OpenCL实现与最快的已知CPU实现进行比较,并达到高达3.7×的加速。使用SA在GPU上构建BWT并在GPU上最快的已知BWT实现11倍的加速。后缀阵列通常使用最长的常见前缀(LCP)信息来增强。我们设计了一种用于在GPU上计算LCP的新型高性能并行算法。我们的GPU实施LCP,分别在离散GPU和APU上实现了高达25倍和4.3倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号