...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach
【24h】

Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

机译:在网络计算系统上处理生物序列比对:分而治之

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems.
机译:在本文中,我们解决了生物序列比对问题,这是几种生物信息学应用程序中最常用的步骤之一。我们采用了可分负载理论(DLT)范式,该范式适用于处理基于网络的系统上的大规模处理,以实现高度的并行性。使用DLT范式,我们提出了一种策略,其中我们仔细划分系统中处理器之间的计算工作量,以最小化确定DNA /蛋白质序列之间最大相似性的总计算时间。我们考虑在作为线性菊花链连接的网络计算平台上处理此类计算问题。我们根据链上的计算和通信链接速度得出要分配给处理器的各个负载量。我们考虑了序列比对的两种情况,其中后处理,即确定最佳比对所需的追溯处理,可能在系统中的各个处理器上进行或不进行。我们得出一些关键条件,以确定我们的策略是否能够产生最佳处理时间。当无法满足上述条件时,我们应用文献中提出的三种不同的启发式策略来生成处理时间的次优解。为了验证所提出的方案,我们在模拟实验中使用了从公共数据库GenBank获得的家鼠线粒体的真实DNA样本和人线粒体的DNA。通过这项研究,我们最终证明了DLT范例对此类与生物序列相关的计算问题的适用性和潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号