...
首页> 外文期刊>Current Bioinformatics >Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs
【24h】

Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs

机译:基于Euler字符串的树木结构数据压缩及其在RNA分析中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Background: Data compression is essential for efficient large-scale data processing, so that a number of studies have been done. Grammar-based compression is to find a small grammar that generates input data, and it has been used not only for data compression but also for analysis ofbiological data since it is useful for pattern extraction. Objective: Recently, for rooted ordered trees, a special kind of network structures, elementary ordered tree grammar (EOTG) has been defined by extending context-free grammar (CFG) and an integer-programming (IP) method whichfinds the smallest EOTG for input data has also been proposed and applied to extract common pattern of RNA secondary structures. However, the method is not so efficient for large input trees. Therefore, development of an efficient method is important. Methods: We propose an Euler string-basedcompression approach that finds the smallest CFG for the Euler string corresponding to an input rooted ordered tree. Results: From a theoretical viewpoint, we show that there exists a gap of compression ratios between the tree grammar-based approach and Euler string-based approach. Froma practical viewpoint, we show the efficiency and effectiveness of our proposed approach by applying it to comparison of RNA secondary structures. Conclusion: The experimental results indicate that the Euler string-based approach can efficiently compress tree-structured data retainingsome structural information of them.
机译:背景:数据压缩对于高效的大规模数据处理至关重要,从而完成了许多研究。基于语法的压缩是找到产生输入数据的小语法,并且它不仅用于数据压缩,而且用于分析生物数据,因为它对于模式提取是有用的。目的:最近,对于植根有序树,通过扩展无下文语法(CFG)和整数 - 编程(IP)方法来定义一种特殊的网络结构,初级有序树语语法(EOTG),该方法将其用于输入最小的EOTG还提出了数据并施加以提取RNA二级结构的常见模式。但是,该方法对于大型输入树不太有效。因此,发展有效方法很重要。方法:我们提出了一种基于Euler字符串的euler字符串基础方法,该方法为与输入的rooted有序树相对应的euler字符串找到最小的cfg。结果:从理论上看,我们表明基于树语的方法和基于欧拉字符串的方法之间存在压缩比的间隙。从实际的观点来看,我们通过将其应用于比较RNA二级结构来展示我们所提出的方法的效率和有效性。结论:实验结果表明,基于欧拉串的方法可以有效地压缩它们的树结构数据保留组结构信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号