Scalable Detection of Frequent Substrings by Grammar-Based Compression

Masaya NAKAHARA; Shirou MARUYAMA; Tetsuji KUBOYAMAHiroshi SAKAMOTO

首页> 外文期刊>IEICE transactions on information and systems >Scalable Detection of Frequent Substrings by Grammar-Based Compression

【24h】

Scalable Detection of Frequent Substrings by Grammar-Based Compression

机译：Scalable Detection of Frequent Substrings by Grammar-Based Compression

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string de-terministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

著录项

来源
《IEICE transactions on information and systems》 |2013年第3期|457-464|共8页
作者
Masaya NAKAHARA; Shirou MARUYAMA; Tetsuji KUBOYAMAHiroshi SAKAMOTO;
展开▼
作者单位

The author is with Gakushuin University, Tokyo, 171-8588 Japan;

The author is with Kyushu University, Fukuoka-shi, 819—0395 Japan;

The authors are with Kyushu Institute of Technology, Iizuka-shi, 820-8502 JapanThe authors are with Kyushu Institute of Technology, Iizuka-shi, 820-8502 Japan,The author is with JST PRESTO, Kawaguchi-shi, 332-0012 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类无线电电子学、电信技术;
关键词
pattern discovery; grammar-based compression; edit-sensitive parsing;

Scalable Detection of Frequent Substrings by Grammar-Based Compression

摘要

著录项

相关主题

期刊订阅