首页> 外文会议>Asia-Pacific Bioinformatics Conference >Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
【24h】

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

机译:使用序列对齐和二次结构的集合快速准确地聚类非码RNA

获取原文

摘要

Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level. Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%. Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.
机译:背景:未定位的转录物的聚类是识别非编码RNA(NCRNA)的新颖家族的重要任务。已经使用基于结构对准的分数使用相似度量来开发了几种分层聚类方法。然而,精确结构对准的高计算成本需要这些方法使用近似算法。这种启发式会降低聚类结果的质量,特别是当在主要序列水平处不可检测到家庭成员之间的相似性时。结果:我们描述了NCRNA的分层聚类的新相似度量。该想法是通过利用其动态编程框架中的次优解的信息,可以提高近似算法的可靠性。我们以比现有方法更简化的方式近似结构对齐。相反,我们的方法利用了所有可能的序列对准和所有可能的二级结构,而现有方法仅使用一个最佳序列对准和一个最佳的二级结构。我们证明,该策略可以在计算成本和聚类质量之间实现最佳平衡。特别是,即使家庭成员的序列标识小于60%,我们的方法也可以保持其高性能。结论:我们的方法能够快速准确地聚类NCRNA。该软件可用于http://bpla-kernel.dna.bio.keio.ac.jp/clustering/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号