Genome Assembly on a Multicore System

机译：在多核系统上的基因组组装

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The genome assembly problem is to generate the original DNA sequence of an organism from a large set of short (400bp-500bp) overlapping fragments. The assembly problem is challenging particularly in presence of repeats, which are multiple identical or nearly identical stretches of DNA. MIRA is an open source assembler, which is widely used by biologist and works effectively in presence of repeats. However, it is computation intensive, for example an assembly of one million fragments requires about 18.3 hours. The computation in MIRA assembler is dominated by the contigs building phase, which is highly sequential in nature. In this paper, we propose a modification to MIRA assembler that allows this computation to be parallelized while maintaining the quality of the assembly. We implemented the modified MIRA assembler on a 64-core system with eight Intel(R) Xeon(R) X7560 processors. We were able to speedup the building contigs phase by a factor of 55 on the 64-core system. Additionally, we parallelized the other phases of the MIRA assembler and were able to reduce the total sequential execution time of assembly from 18.3 hours to 3.4 hours (speedup of 5.57) without sacrificing assembly quality. It is worth noting that the overall speedup is limited by Amdahl's Law as parts of original MIRA assembler are inherently sequential. For example for one million reads the sequential portion of the MIRA assembler takes about 2.78 hours doing I/O or other operations which limits the overall speedup to 6.58.

机译：基因组组装问题是从大组短（400bp-500bp）重叠碎片产生生物体的原始DNA序列。组装问题特别是在存在重复存在的情况下具有挑战性，这是多种相同或几乎相同的DNA的延伸。 MIRA是一个开源汇编器，它被生物学家广泛使用，并在重复存在下有效地工作。然而，它是计算密集型，例如一百万片碎片的组装需要大约18.3小时。 Mira汇编程序中的计算由Contigs构建阶段主导，其在性质上是高度顺序的。在本文中，我们提出了对Mira汇编器的修改，该Mira汇编器允许该计算在保持组件的质量的同时并行化。我们在具有八个英特尔（R）Xeon（R）X7560处理器的64核系统上实现了修改的Mira汇编程序。我们能够在64核系统上加速建筑物Contigs阶段55倍。另外，我们并将其他阶段平行化了Mira汇编器的其他阶段，并且能够将组装的总顺序执行时间从18.3小时减少到3.4小时（加速5.57），而不会牺牲装配质量。值得注意的是，由于原始Mira汇编器的一部分固有地，整体加速度受Amdahl的定律限制。例如，一百万读取Mira汇编程序的顺序部分需要大约2.78小时的执行I / O或其他操作，这些操作将整体加速限制为6.58。

著录项

来源
《IEEE International Conference on Trust, Security and Privacy in Computing and Communications》|2013年||共8页
会议地点
作者
Abhishek Biswas; Desh Ranjan; Mohammad Zubair;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393.08-53;
关键词
component; Parallel Genome Assembly; Multicore parallelism; OpenMP and OLC Graph Model;

机译：组件;平行基因组组装;多芯行行性;OpenMP和OLC图模型;

相似文献

外文文献
中文文献
专利

1. Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery [J] . Bocklandt Sven, Hastie Alex, Cao Han Advances in Experimental Medicine and Biology . 2019,第期

机译：Bionano Genome测绘：高通量，超长期分子基因组分析系统，用于精密基因组组装和单倍体分辨结构变异发现
2. Quality of prokaryote genome assembly: Indispensable issues of factors affecting prokaryote genome assembly quality [J] . CarneiroA.R., RamosR.T.J., BarbosaH.P.M., Gene: An International Journal Focusing on Gene Cloning and Gene Structure and Function . 2012,第2期

机译：原核生物基因组组装质量：影响原核生物基因组组装质量的因素必不可少的问题
3. Whole-genome shotgun assembly and comparison of human genome assemblies [J] . Sorin Istrail, Granger G. Sutton, Liliana Florea, Proceedings of the National Academy of Sciences of the United States of America . 2004,第7期

机译：全基因组shot弹枪组装和人类基因组组装的比较
4. Genome Assembly on a Multicore System [C] . Biswas Abhishek, Ranjan Desh, Zubair Mohammad 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications . 2013

机译：多核系统上的基因组组装
5. Multicore system design with XUM: The Extensible Utah Multicore project. [D] . Meakin, Benjamin LaSalle. 2010

机译：XUM的多核系统设计：可扩展的犹他州多核项目。
6. Multicore Assemblies from Three-Component Linear Homo-Copolymer Systems: A Coarse-Grained Modeling Study [O] . Sousa Javan Nikkhah, Elsi Turunen, Anneli Lepo, 2021

机译：三组分线性同源共聚物系统的多核组件：粗粒造粒模型研究
7. Planificación de procesos en sistemas multicore asimétricos = Thread Scheduling on Asymmetric Multicore Systems [O] . Sáez Alcaide Juan Carlos 2011

机译：非对称多核系统中的流程计划=非对称多核系统上的线程调度

Genome Assembly on a Multicore System

摘要

著录项

相似文献

相关主题

期刊订阅