首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Performance evaluation of concurrent collections on high-performance multicore computing systems
【24h】

Performance evaluation of concurrent collections on high-performance multicore computing systems

机译:高性能多核计算系统上并发集合的性能评估

获取原文
获取原文并翻译 | 示例

摘要

This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial “higher-level” partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6×. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.
机译:本文是对最近提出的并行编程模型Concurrent Collections(CnC)的首次广泛性能研究。在CnC中,程序员根据特定于应用程序的操作来表达其计算,该操作由语义调度约束部分排序。 CnC模型非常适合表达异步并行算法,因此我们使用两种密集线性代数算法以这种风格评估CnC,以在最新的多核系统上执行:(i)最近提出的异步并行Cholesky分解算法;(ii)一种新颖且非平凡的“高阶”部分异步广义本征求解器,用于密集对称矩阵。给定一个经过良好调整的顺序BLAS,我们的实现将与竞争的多线程供应商调整的代码相匹配或超过2.6倍。我们的评估与英特尔Harpertown,Nehalem和AMD Barcelona系统上具有共享内存MPI,OpenMP,Cilk ++和PLASMA 2.0的ScaLAPACK等替代模型进行了比较。展望未来,我们发现了改进CnC语言以及运行时调度和执行的新机会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号