Approximate weighted matching on emerging manycore and multithreaded architectures

Mahantesh Halappanavar; John Feo; Oreste Villa; Antonino Tumeo; Alex Pothen

首页> 外文期刊>International Journal of High Performance Computing Applications >Approximate weighted matching on emerging manycore and multithreaded architectures

【24h】

Approximate weighted matching on emerging manycore and multithreaded architectures

机译：新兴的多核和多线程体系结构上的近似加权匹配

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graph matching is a prototypical combinatorial problem with many applications in high-performance scientific computing. Optimal algorithms for computing matchings are challenging to parallelize. Approximation algorithms are amenable to parallelization and are therefore important to compute matchings for large-scale problems. Approximation algorithms also generate nearly optimal solutions that are sufficient for many applications. In this paper we present multithreaded algorithms for computing half-approximate weighted matching on state-of-the-art multicore (Intel Nehalem and AMD Magny-Cours), manycore (Nvidia Tesla and Nvidia Fermi), and massively multithreaded (Cray XMT) platforms. We provide two implementations: the first uses shared work queues and is suited for all platforms; and the second implementation, based on dataflow principles, exploits special features available on the Cray XMT. Using a carefully chosen dataset that exhibits characteristics from a wide range of applications, we show scalable performance across different platforms. In particular, for one instance of the input, an R-MAT graph (RMAT-G), we show speedups of about 32 on 48 cores of an AMD Magny-Cours, 7 on 8 cores of Intel Nehalem, 3 on Nvidia Tesla and 10 on Nvidia Fermi relative to one core of Intel Nehalem, and 60 on 128 processors of Cray XMT. We demonstrate strong as well as weak scaling for graphs with up to a billion edges using up to 12,800 threads. We avoid excessive fine-tuning for each platform and retain the basic structure of the algorithm uniformly across platforms. An exception is the dataflow algorithm designed specifically for the Cray XMT. To the best of the authors' knowledge, this is the first such large-scale study of the half-approximate weighted matching problem on multithreaded platforms. Driven by the critical enabling role of combinatorial algorithms such as matching in scientific computing and the emergence of informatics applications, there is a growing demand to support irregular computations on current and future computing platforms. In this context, we evaluate the capability of emerging multithreaded platforms to tolerate latency induced by irregular memory access patterns, and to support fine-grained parallelism via light-weight synchronization mechanisms. By contrasting the architectural features of these platforms against the Cray XMT, which is specifically designed to support irregular memory-intensive applications, we delineate the impact of these choices on performance.

机译：图匹配是一个典型的组合问题，在高性能科学计算中有许多应用。用于计算匹配的最佳算法很难并行化。近似算法适用于并行化，因此对于计算大规模问题的匹配非常重要。逼近算法还生成了足以满足许多应用的最佳解决方案。在本文中，我们介绍了用于在最先进的多核（Intel Nehalem和AMD Magny-Cours），manycore（Nvidia Tesla和Nvidia Fermi）和大规模多线程（Cray XMT）平台上计算半近似加权匹配的多线程算法。。我们提供两种实现：第一种使用共享工作队列，并且适合所有平台；第二种实现基于数据流原理，利用了Cray XMT上的特殊功能。通过使用精心挑选的数据集，该数据集展示了各种应用程序的特征，我们展示了跨不同平台的可扩展性能。特别是，对于输入的一个实例，即R-MAT图（RMAT-G），我们显示AMD Magny-Cours的48个内核上的加速比约为32，Intel Nehalem的8个内核上的加速比约为7，Nvidia Tesla和Linux上的加速比为3。相对于Intel Nehalem的一个内核，在Nvidia Fermi上为10，在Cray XMT的128个处理器上为60。对于使用多达12,800个线程的多达十亿条边的图形，我们演示了强而弱的缩放。我们避免对每个平台进行过多的微调，并在各个平台之间统一保留算法的基本结构。例外是专门为Cray XMT设计的数据流算法。据作者所知，这是对多线程平台上的半近似加权匹配问题的首次大规模研究。在组合算法的关键促成作用（例如科学计算中的匹配以及信息学应用程序的出现）的驱动下，对在当前和将来的计算平台上支持不规则计算的需求不断增长。在这种情况下，我们评估了新兴的多线程平台容忍由不规则内存访问模式引起的等待时间，并通过轻量级同步机制支持细粒度并行性的能力。通过将这些平台的体系结构功能与专门用于支持不规则内存密集型应用程序的Cray XMT进行对比，我们描述了这些选择对性能的影响。

著录项

来源
《International Journal of High Performance Computing Applications》 |2012年第4期|413-430|共18页
作者
Mahantesh Halappanavar; John Feo; Oreste Villa; Antonino Tumeo; Alex Pothen;
展开▼
作者单位

Pacific Northwest National Laboratory, 902 Battelle Boulevard, P.O.Box 999, MSIN J4-30, Richland, WA 99352, USA;

Pacific Northwest National Laboratory, Richland, WA, USA;

Pacific Northwest National Laboratory, Richland, WA, USA;

Pacific Northwest National Laboratory, Richland, WA, USA;

Purdue University, West Lafayette, IN, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
approximation; dataflow; GPU; graph algorithms; manycore; matching; multicore; weighted matching;

机译：近似;数据流;GPU;图算法多核匹配;多核加权匹配;

相似文献

外文文献
中文文献
专利

1. Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY): Preface [J] . Johanna Sepúlveda, Vania Marangozova-Martin, Jeronimo Castrillon Procedia Computer Science . 2017,第1期

机译：新兴ManYcore系统的体系结构，语言，编译和硬件支持（ALCHEMY）：前言
2. Exploring emerging manycore architectures for uncertainty quantification through embedded stochastic Galerkin methods [J] . Eric Phipps, Jonathan Hu, Jakob T. Ostien International journal of computer mathematics . 2014,第3a4期

机译：探索新兴的多核架构，通过嵌入式随机Galerkin方法进行不确定性量化
3. FPGA prototyping of emerging manycore architectures for parallel programming research using Formic boards [J] . Spyros Lyberis, George Kalokerinos, Michalis Lygerakis, Journal of systems architecture . 2014,第6期

机译：使用Formic板卡进行并行编程研究的新兴多核架构的FPGA原型
4. A multithreaded algorithm for network alignment via approximate matching [C] . Khan Arif M., Gleich David F., Pothen Alex, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis. . 2012

机译：通过近似匹配进行网络对齐的多线程算法
5. An integrated power, area, and timing modeling framework for the design of multithreaded and multi/manycore architectures. [D] . Li, Sheng. 2010

机译：一个集成的电源，区域和时序建模框架，用于设计多线程和多/多核架构。
6. Dynamic partitioning of search patterns for approximate pattern matching using search schemes [O] . Luca Renders, Kathleen Marchal, Jan Fostier 2021

机译：使用搜索方案进行近似模式匹配的搜索模式的动态分区
7. Optimizing Approximate Weighted Matching on Nvidia Kepler K40 [O] . Naim, Md., Manne, Fredrik, Halappanavar, Mahantesh, 2018

机译：优化Nvidia Kepler K40的近似加权匹配

Approximate weighted matching on emerging manycore and multithreaded architectures

摘要

著录项

相似文献

相关主题

期刊订阅