首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >Streaming Distributed DNA Sequence Alignment Using Apache Spark
【24h】

Streaming Distributed DNA Sequence Alignment Using Apache Spark

机译:使用Apache Spark流媒体分布式DNA序列对齐

获取原文

摘要

The large amount of data generated by NextGeneration Sequencing (NGS) technology, usually in the order of hundreds of gigabytes per experiment, has to be analyzed quickly to generate meaningful variant results. The first step in analyzing such data is to map those sequenced reads to their corresponding positions in the human genome. One of the most popular tools to do such sequence alignment is the Burrows-Wheeler Aligner (BWA mem). One limitation of the BWA program though is that it cannot be run on a cluster. In this paper, we propose StreamBWA, a new framework that allows the BWA mem program to run on a cluster in a distributed fashion, at the same time while the input data is being streamed into the cluster. It can process the input data directly from a compressed file, which either lies on the local file system or on a URL. Moreover, StreamBWA can start combining the output files of the distributed BWA mem tasks at the same time while these tasks are still being executed on the cluster. Empirical evaluation shows that this streaming distributed approach is approximately 2x faster than the nonstreaming approach. Furthermore, our streaming distributed approach is 5x faster than other state-of-the-art solutions such as SparkBWA. The source code of StreamBWA is publicly available at https://github.com/HamidMushtaq/StreamBWA.
机译:由下一天性测序(NGS)技术产生的大量数据通常迅速分析每实验数百千兆字节的量以产生有意义的变体结果。分析此类数据的第一步是将这些测序读取到人类基因组中的对应位置。这样序列对齐的最流行的工具之一是挖掘机轮车对准器(BWA MEM)。 BWA程序的一个限制是它不能在群集中运行。在本文中,我们提出了一种新的框架,这是一个新的框架,它允许BWA MEM程序以分布式方式在群集中运行,同时在输入数据被流入群集时。它可以直接从压缩文件处理输入数据,这些文件无论是在本地文件系统还是在URL上。此外,StreamBWA可以同时启动分布式BWA MEM任务的输出文件,同时在这些任务仍在群集中执行。经验评估表明,这种流分布式方法比非非流动方法快约2倍。此外,我们的流分布式方法比其他最先进的解决方案(如SparkBWA)快5倍。 Streambwa的源代码在https://github.com/hamidmushtaq/streambwa上公开使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号