...
首页> 外文期刊>International Journal of High Performance Computing Applications >Approximate weighted matching on emerging manycore and multithreaded architectures
【24h】

Approximate weighted matching on emerging manycore and multithreaded architectures

机译:新兴的多核和多线程体系结构上的近似加权匹配

获取原文
获取原文并翻译 | 示例
           

摘要

Graph matching is a prototypical combinatorial problem with many applications in high-performance scientific computing. Optimal algorithms for computing matchings are challenging to parallelize. Approximation algorithms are amenable to parallelization and are therefore important to compute matchings for large-scale problems. Approximation algorithms also generate nearly optimal solutions that are sufficient for many applications. In this paper we present multithreaded algorithms for computing half-approximate weighted matching on state-of-the-art multicore (Intel Nehalem and AMD Magny-Cours), manycore (Nvidia Tesla and Nvidia Fermi), and massively multithreaded (Cray XMT) platforms. We provide two implementations: the first uses shared work queues and is suited for all platforms; and the second implementation, based on dataflow principles, exploits special features available on the Cray XMT. Using a carefully chosen dataset that exhibits characteristics from a wide range of applications, we show scalable performance across different platforms. In particular, for one instance of the input, an R-MAT graph (RMAT-G), we show speedups of about 32 on 48 cores of an AMD Magny-Cours, 7 on 8 cores of Intel Nehalem, 3 on Nvidia Tesla and 10 on Nvidia Fermi relative to one core of Intel Nehalem, and 60 on 128 processors of Cray XMT. We demonstrate strong as well as weak scaling for graphs with up to a billion edges using up to 12,800 threads. We avoid excessive fine-tuning for each platform and retain the basic structure of the algorithm uniformly across platforms. An exception is the dataflow algorithm designed specifically for the Cray XMT. To the best of the authors' knowledge, this is the first such large-scale study of the half-approximate weighted matching problem on multithreaded platforms. Driven by the critical enabling role of combinatorial algorithms such as matching in scientific computing and the emergence of informatics applications, there is a growing demand to support irregular computations on current and future computing platforms. In this context, we evaluate the capability of emerging multithreaded platforms to tolerate latency induced by irregular memory access patterns, and to support fine-grained parallelism via light-weight synchronization mechanisms. By contrasting the architectural features of these platforms against the Cray XMT, which is specifically designed to support irregular memory-intensive applications, we delineate the impact of these choices on performance.
机译:图匹配是一个典型的组合问题,在高性能科学计算中有许多应用。用于计算匹配的最佳算法很难并行化。近似算法适用于并行化,因此对于计算大规模问题的匹配非常重要。逼近算法还生成了足以满足许多应用的最佳解决方案。在本文中,我们介绍了用于在最先进的多核(Intel Nehalem和AMD Magny-Cours),manycore(Nvidia Tesla和Nvidia Fermi)和大规模多线程(Cray XMT)平台上计算半近似加权匹配的多线程算法。 。我们提供两种实现:第一种使用共享工作队列,并且适合所有平台;第二种实现基于数据流原理,利用了Cray XMT上的特殊功能。通过使用精心挑选的数据集,该数据集展示了各种应用程序的特征,我们展示了跨不同平台的可扩展性能。特别是,对于输入的一个实例,即R-MAT图(RMAT-G),我们显示AMD Magny-Cours的48个内核上的加速比约为32,Intel Nehalem的8个内核上的加速比约为7,Nvidia Tesla和Linux上的加速比为3。相对于Intel Nehalem的一个内核,在Nvidia Fermi上为10,在Cray XMT的128个处理器上为60。对于使用多达12,800个线程的多达十亿条边的图形,我们演示了强而弱的缩放。我们避免对每个平台进行过多的微调,并在各个平台之间统一保留算法的基本结构。例外是专门为Cray XMT设计的数据流算法。据作者所知,这是对多线程平台上的半近似加权匹配问题的首次大规模研究。在组合算法的关键促成作用(例如科学计算中的匹配以及信息学应用程序的出现)的驱动下,对在当前和将来的计算平台上支持不规则计算的需求不断增长。在这种情况下,我们评估了新兴的多线程平台容忍由不规则内存访问模式引起的等待时间,并通过轻量级同步机制支持细粒度并行性的能力。通过将这些平台的体系结构功能与专门用于支持不规则内存密集型应用程序的Cray XMT进行对比,我们描述了这些选择对性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号