首页> 美国卫生研究院文献>other >Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures
【2h】

Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures

机译:利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Modern mass spectrometers can produce large numbers of peptide spectra from complex biological samples in a short time. A substantial amount of redundancy is observed in these data sets from peptides that may get selected multiple times in Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) experiments. A large number of spectra do not get mapped to specific peptide sequences due to low signal-to-noise (S/N) ratio of the spectra from these machines. Clustering is one way to mitigate the problems of these complex mass spectrometry data sets. Recently we presented a graph theoretic framework, known as CAMS, for clustering of large-scale mass spectrometry data. CAMS utilized a novel metric to exploit the spatial patterns in the mass spectrometry peaks which allowed highly accurate clustering results. However, comparison of each spectrum with every other spectrum makes the clustering problem computationally inefficient. In this paper we present a parallel algorithm, called P-CAMS, that uses thread-level and instruction-level parallelism on multicore architectures to substantially decrease running times. P-CAMS relies on intelligent matrix completion to reduce the number of comparisons, threads to run on each core and Single Instruction Multiple Data (SIMD) paradigm inside each thread to exploit massive parallelism on multicore architectures. A carefully crafted load-balanced scheme that uses spatial locations of the mass spectrometry peaks mapped to nearest level cache and core allows super-linear speedups. We study the scalability of the algorithm with a wide variety of mass spectrometry data and variation in architecture specific parameters. The results show that SIMD style data parallelism combined with thread-level parallelism for multicore architectures is a powerful combination that allows substantial reduction in runtimes even for all-to-all comparison algorithms. The quality assessment is performed using real-world data set and is shown to be consistent with the serial version of the same algorithm.
机译:现代质谱仪可在短时间内从复杂的生物样品中产生大量肽谱。在这些数据集中,从肽中发现了大量的冗余,这些肽在液相色谱串联质谱(LC-MS / MS)实验中可能会多次选择。由于来自这些机器的光谱的信噪比(S / N)低,因此无法将大量光谱映射到特定的肽序列。聚类是减轻这些复杂质谱数据集问题的一种方法。最近,我们提出了一种称为CAMS的图形理论框架,用于对大规模质谱数据进行聚类。 CAMS利用一种新颖的度量标准来利用质谱峰中的空间模式,从而可以实现高度准确的聚类结果。但是,将每个光谱与每个其他光谱进行比较会使聚类问题在计算上效率低下。在本文中,我们提出了一种称为P-CAMS的并行算法,该算法在多核体系结构上使用线程级和指令级并行性来显着减少运行时间。 P-CAMS依靠智能矩阵完成来减少比较次数,在每个内核上运行的线程以及每个线程内部的单指令多数据(SIMD)范例,以在多核体系结构上利用大规模并行性。精心设计的负载均衡方案使用映射到最近级缓存和核心的质谱峰的空间位置,可以实现超线性加速。我们使用各种各样的质谱数据和特定于体系结构的参数来研究算法的可扩展性。结果表明,对于多核体系结构,SIMD样式数据并行性与线程级并行性相结合是一种强大的组合,即使对于所有比较算法,也可以大幅减少运行时间。质量评估是使用实际数据集执行的,并且证明与同一算法的串行版本一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号