Performance evaluation of concurrent collections on high-performance multicore computing systems

机译：高性能多核计算系统上并发集合的性能评估

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial “higher-level” partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6×. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.

机译：本文是对最近提出的并行编程模型Concurrent Collections（CnC）的首次广泛性能研究。在CnC中，程序员根据特定于应用程序的操作来表达其计算，该操作由语义调度约束部分排序。 CnC模型非常适合表达异步并行算法，因此我们使用两种密集线性代数算法以这种风格评估CnC，以在最新的多核系统上执行：（i）最近提出的异步并行Cholesky分解算法；（ii）一种新颖且非平凡的“高阶”部分异步广义本征求解器，用于密集对称矩阵。给定一个经过良好调整的顺序BLAS，我们的实现将与竞争的多线程供应商调整的代码相匹配或超过2.6倍。我们的评估与英特尔Harpertown，Nehalem和AMD Barcelona系统上具有共享内存MPI，OpenMP，Cilk ++和PLASMA 2.0的ScaLAPACK等替代模型进行了比较。展望未来，我们发现了改进CnC语言以及运行时调度和执行的新机会。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|P.1-12|共12页
会议地点 Atlanta GA(US);Atlanta GA(US)
作者
Chandramowlishwaran Aparna; Knobe Kathleen; Vuduc Richard;
展开▼
作者单位

College of Computing, Georgia Institute of Technology, Atlanta, GA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. RAPID for high-performance computing systems: architecture and performance evaluation [J] . Avinash Karanth Kodi, Ahmed Louri Applied Optics . 2006,第25期

机译：高性能计算系统的RAPID：体系结构和性能评估
2. RAPID for high-performance computing systems: architecture and performance evaluation [J] . Avinash Karanth Kodi, Ahmed Louri Applied optics . 2006,第25期

机译：高性能计算系统的RAPID：体系结构和性能评估
3. Modeling and analysis of performances for concurrent multithread applications on multicore and graphics processing unit systems [J] . Cerotti D., Gribaudo M., Iacono M., Concurrency and computation: practice and experience . 2016,第2期

机译：多核和图形处理单元系统上并发多线程应用程序的性能建模和分析
4. Performance evaluation of concurrent collections on high-performance multicore computing systems [C] . Chandramowlishwaran A., Knobe K., Vuduc R. 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：高性能多核计算系统上并发集合的性能评估
5. High-performance computing algorithms for constructing inverted files on emerging multicore processors. [D] . Wei, Zheng. 2012

机译：用于在新兴的多核处理器上构造反向文件的高性能计算算法。
6. DScan – a high-performance digital scanning system for entomological collections [O] . Stefan Schmidt, Michael Balke, Stefan Lafogler 2012

机译：DScan –一种用于昆虫采集的高性能数字扫描系统
7. Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems [O] . Aparna Ch, Kathleen Knobe, Richard Vuduc 2010

机译：高性能多核计算系统上并发集合的性能评估
8. Evaluating Early High-Performance Computing Systems [R] . El-Ghazawi, Tarek, Ozkaya, Armagan, Meajil, Abdullah 1994

机译：评估早期的高性能计算系统

Performance evaluation of concurrent collections on high-performance multicore computing systems

摘要

著录项

相似文献

相关主题

期刊订阅