首页> 美国卫生研究院文献>other >Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments
【2h】

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

机译:分而治之(DC)BLAST:在HPC环境中快速简便地执行BLAST

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible and used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. This freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.
机译:生物信息学目前面临着非常庞大的数据集,这些数据集会导致计算工作,尤其是序列相似性搜索,这可能需要花费很长时间才能运行。例如,美国国家生物技术信息中心(NCBI)基本局部比对搜索工具(BLAST和BLAST +)套件是迄今为止最广泛用于核酸或氨基酸序列之间快速相似性搜索的工具,它是高度中央处理单元(CPU)密集型。尽管BLAST程序套件可以非常快速地执行搜索,但它们有可能被加速。近年来,由于高性能计算(HPC)系统的可用性日益提高,分布式计算环境已变得更加广泛地可访问和使用。因此,需要简单的数据并行解决方案来加快BLAST和其他序列分析工具的速度。但是,用于并行序列相似性搜索的现有软件通常需要用户方面的大量计算经验和技能。为了加速BLAST和其他序列分析工具,开发了分而治之BLAST(DCBLAST),以使用查询序列分布方法在群集,网格或HPC环境中执行NCBI BLAST搜索。从一(1)个扩展到256个CPU内核可以显着提高处理速度。因此,DCBLAST使用简单,可访问,健壮和并行的方法极大地加速了BLAST搜索的执行。 DCBLAST自动跨多个节点工作,它克服了单节点BLAST程序的速度限制。 DCBLAST可以在任何HPC系统上使用,可以利用数百个节点,并且没有输出限制。这个免费提供的工具简化了分布式计算流水线,有助于快速发现非常大的数据集之间的序列相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号