Scalable Detection of Frequent Substrings by Grammar-Based Compression

Masaya NAKAHARA; Shirou MARUYAMA; Tetsuji KUBOYAMA; Hiroshi SAKAMOTO

首页> 外文期刊>IEICE Transactions on Information and Systems >Scalable Detection of Frequent Substrings by Grammar-Based Compression

【24h】

Scalable Detection of Frequent Substrings by Grammar-Based Compression

机译：通过基于语法的压缩可伸缩地检测频繁的子字符串

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string de-terministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

机译：提出了通过压缩的可伸缩模式发现。字符串可以由确定性派生该字符串的上下文无关语法表示。在这种基于语法的压缩框架中，算法的目的是输出尽可能小的语法。除此之外，优化问题是可以解决的。在这样的近似算法中，基于编辑敏感分析（ESP）的压缩器特别适合于检测最大的公共子字符串以及长的频繁子字符串。基于ESP，我们设计了一种线性时间算法，以近似地查找字符串中的所有频繁模式，并证明几个下限以保证提取的模式的长度。我们还通过在生物序列和其他可压缩的真实世界文本中进行实验来检验算法的性能。与其他实用算法相比，我们的算法在使用大型重复字符串的情况下速度更快且可扩展性更高。

著录项

来源
《IEICE Transactions on Information and Systems》 |2013年第3期|457-464|共8页
作者
Masaya NAKAHARA; Shirou MARUYAMA; Tetsuji KUBOYAMA; Hiroshi SAKAMOTO;
展开▼
作者单位

The authors are with Kyushu Institute of Technology, Iizuka-shi, 820-8502 Japan;

The author is with Kyushu University, Fukuoka-shi, 819—0395 Japan;

The author is with Gakushuin University, Tokyo, 171-8588 Japan;

The authors are with Kyushu Institute of Technology, Iizuka-shi, 820-8502 Japan,The author is with JST PRESTO, Kawaguchi-shi, 332-0012 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
pattern discovery; grammar-based compression; edit-sensitive parsing;

机译：模式发现;基于语法的压缩;编辑敏感的解析;

相似文献

外文文献
中文文献
专利

1. Scalable Detection of Frequent Substrings by Grammar-Based Compression [J] . Masaya NAKAHARA, Shirou MARUYAMA, Tetsuji KUBOYAMA, IEICE transactions on information and systems . 2013,第3期

机译：通过基于语法的压缩可伸缩地检测频繁的子字符串
2. Using Adaptive Automata in Grammar Based Text Compression to Identify Frequent Substrings [J] . Newton Kiyotaka Miura, Joao Jose Neto International Journal of Computer Science & Information Technology (IJCSIT) . 2017,第2期

机译：在基于语法的文本压缩中使用自适应自动机来识别频繁的子字符串
3. Forensic AMR Double Compression Detection Using Linear Prediction Coefficients and Robust Scaling [J] . Journal of the Audio Engineering Society . 2019,第10期

机译：使用线性预测系数和稳健缩放的法医AMR双重压缩检测
4. Scalable Detection of Frequent Substrings by Grammar-Based Compression [C] . Masaya Nakahara, Shirou Maruyama, Tetsuji Kuboyama, Discovery science . 2011

机译：通过基于语法的压缩可伸缩地检测频繁的子字符串
5. On the design and analysis of grammar-based data compression algorithms. [D] . Jia, Yunwei. 2002

机译：基于语法的数据压缩算法的设计与分析。
6. Implementation of 3D spatial indexing and compression in a large-scale molecular dynamics simulation database for rapid atomic contact detection [O] . Rudesh D Toofanny, Andrew M Simms, David AC Beck, 2011

机译：在大规模分子动力学模拟数据库中实现3D空间索引和压缩以快速进行原子接触检测
7. Using Adaptive Automata in Grammar Based Text Compression to Identify Frequent Substrings [O] . Newton Kiyotaka Miura, Joao Jose Neto 2017

机译：基于语法的文本压缩中的自适应自动机识别频繁的子串

Scalable Detection of Frequent Substrings by Grammar-Based Compression

摘要

著录项

相似文献

相关主题

期刊订阅