首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >Accelerating Clustering using Approximate Spanning Tree and Prime Number Based Filter
【24h】

Accelerating Clustering using Approximate Spanning Tree and Prime Number Based Filter

机译:使用近似生成树和基于素数的滤波器加速聚类

获取原文

摘要

Motivation: Clustering genomic data, including those generated via high-throughput sequencing, is an important preliminary step for assembly and analysis. However, clustering a large number of sequences is time-consuming. Methods: In this paper, we discuss algorithmic performance improvements to our existing clustering system called PEACE via the following two new approaches: (1) using Approximate Spanning Tree (AST) that is computed much faster than the currently used Minimum Spanning Tree (MST) approach, and (2) a novel Prime Numbers based Heuristic (PNH) for generating features and comparing them to further reduce comparison overheads. Results: Experiments conducted using a variety of data sets show that the proposed method significantly improves performance for datasets with large clusters with only minimal degradation in clustering quality. We also compare our methods against wcd-kaboom, a state-of-the-art clustering software. Our experiments show that with AST and PNH underperform wcd-kaboom for datasets that have many small clusters. However, they significantly outperform wcd-kaboom for datasets with large clusters by a conspicuous ~550× with comparable clustering quality. The results indicate that the proposed methods hold considerable promise for accelerating clustering of genomic data with large clusters.
机译:动机:聚类基因组数据,包括通过高通量测序产生的数据数据是组装和分析的重要初步步骤。但是,聚类大量序列是耗时的。方法:在本文中,我们讨论了通过以下两种新方法对我们现有的聚类系统进行了算法性能改进:(1)使用比当前使用的最小生成树(MST)更快地计算的近似生成树(AST)方法,和(2)基于新的基于素数的启发式(PNH),用于产生特征并将它们进行比较,以进一步减少比较开销。结果:使用各种数据集进行的实验表明,该方法显着提高了具有大型集群的数据集的性能,仅具有集群质量的最小劣化。我们还将我们的方法与WCD-Kaboom,最先进的聚类软件进行了比较。我们的实验表明,对于具有许多小集群的数据集,使用AST和PNH upforporporporforp-Kaboom。但是,它们显着优于具有大簇的数据集,其具有显着〜550×具有可比聚类质量的显着〜550×。结果表明,拟议的方法对加速基因组数据与大集群的聚类具有相当大的承诺。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号