【24h】

Engineering a Lightweight Suffix Array Construction Algorithm

机译:设计轻量级后缀数组构造算法

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of computing the suffix array of a text T[l,n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array (or PAT array) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm, which is a provably and practically effective compression tool; (2) the construction of succinct and compressed indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and "lightweight" in the sense that it uses small space.
机译:我们考虑计算文本T [l,n]的后缀数组的问题。这个问题在于按字典顺序对T的后缀进行排序。后缀数组(或PAT数组)是一种简单,易于编码且优雅的数据结构,用于一些涉及语言文本和生物学数据的基本字符串匹配问题。最近,通过将其用作三个新颖应用程序的构建基块,使人们对该数据结构的兴趣得到了重新激发:(1)Burrows-Wheeler压缩算法,它是一种可证明且实用的压缩工具; (2)简洁和压缩索引的构造;后者可以使用传统压缩器仅用于文本的大致相同空间来存储输入文本及其全文索引; (3)用于在网络搜索引擎中对用户查询的答案进行聚类和排名的算法。在所有这些应用程序中,后缀数组的构造都是时间和空间上的计算瓶颈。这激发了我们对设计另一种后缀数组构造算法的兴趣,该算法在占用空间较小的意义上又快速又“轻巧”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号