首页> 外文期刊>IEICE transactions on information and systems >Scalable Detection of Frequent Substrings by Grammar-Based Compression
【24h】

Scalable Detection of Frequent Substrings by Grammar-Based Compression

机译:Scalable Detection of Frequent Substrings by Grammar-Based Compression

获取原文
获取原文并翻译 | 示例
           

摘要

A scalable pattern discovery by compression is proposed. A string is representable by a context-free grammar deriving the string de-terministically. In this framework of grammar-based compression, the aim of the algorithm is to output as small a grammar as possible. Beyond that, the optimization problem is approximately solvable. In such approximation algorithms, the compressor based on edit-sensitive parsing (ESP) is especially suitable for detecting maximal common substrings as well as long frequent substrings. Based on ESP, we design a linear time algorithm to find all frequent patterns in a string approximately and prove several lower bounds to guarantee the length of extracted patterns. We also examine the performance of our algorithm by experiments in biological sequences and other compressible real world texts. Compared to other practical algorithms, our algorithm is faster and more scalable with large and repetitive strings.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号