Tuning a Finite Difference Computation for Parallel Vector Processors

机译：调整并行矢量处理器的有限差分计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current CPU and GPU architectures heavily use data and instruction parallelism at different levels. Floating point operations are organised in vector instructions of increasing vector length. For reasons of performance it is mandatory to use the vector instructions efficiently. Several ways of tuning a model problem finite difference stencil computation are discussed. The combination of vectorisation and an interleaved data layout, cache aware algorithms, loop unrolling, parallelisation and parameter tuning lead to optimised implementations at a level of 90% peak performance of the floating point pipelines on recent Intel Sandy Bridge and AMD Bulldozer CPU cores, both with AVX vector instructions as well as on Nvidia Fermi/ Kepler GPU architectures. Furthermore, we present numbers for parallel multi-core/ multi-processor and multi-GPU configurations. They represent regularly more than an order of speed up compared to a standard implementation. The analysis may also explain deficiencies of automatic vectorisation for linear data layout and serve as a foundation of efficient implementations of more complex expressions.

机译：目前的CPU和GPU架构在不同级别中使用数据和指令并行性。浮点操作在增加矢量长度的矢量指令中组织。出于性能原因，必须有效地使用矢量指令。讨论了几种调整模型问题的方法有限差异模板计算。载体的组合和交错数据布局，缓存意识算法，环路展开，平行化和参数调谐导致浮点管道上最近的Intel Sandy Bridge和AMD推土机CPU核心的90％峰值性能的优化实现具有AVX矢量说明以及NVIDIA FERMI / KEPPLER GPU架构。此外，我们为并行多核/多处理器和多GPU配置提供数字。与标准实施相比，它们定期表示超过加速顺序。分析还可以解释用于线性数据布局的自动载体的缺陷，并作为更复杂的表达式的有效实现的基础。

著录项

来源
《International Symposium on Parallel and Distributed Computing》|2012年||共8页
会议地点
作者
Zumbusch Gerhard;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP316.4-53;
关键词

相似文献

外文文献
中文文献
专利

1. Optimizing the computation of a parallel 3D finite difference algorithm for graphics processing units [J] . J. Porter-Sobieraj, S. Cygert, D. Kikoła, CONCURRENCY PRACTICE & EXPERIENCE . 2015,第6期

机译：优化图形处理单元的并行3D有限差分算法的计算
2. Load balancing and tuning the Schur complement computations in parallel finite element analysis [J] . G. P. Nikishkov, H. Kanda, A. Makinouchi, Computer modeling and simulation in engineering . 1999,第1期

机译：并行有限元分析中的负载平衡和调整Schur补码计算
3. Parallel and vectorial solving of finite element problems on a shared-memory multiprocessor [J] . Magnin H., Coulomb J.L. IEEE Transactions on Magnetics . 1992,第2期

机译：共享内存多处理器上有限元问题的并行和矢量求解
4. Tuning a Finite Difference Computation for Parallel Vector Processors [C] . Zumbusch Gerhard 2012 11th International Symposium on Parallel and Distributed Computing. . 2012

机译：调整并行矢量处理器的有限差分计算
5. Algorithm Analysis, Code Analysis, Code Vectorization, and Code Parallelization of a Fast Finite Difference Method for Space Fractional Diffusion Equations in Three Space Dimensions. [D] . Swartz, Matthew. 2015

机译：快速有限差分方法在三个空间维度上的空间分式扩散方程的算法分析，代码分析，代码矢量化和代码并行化。
6. Differences in Vector Genome Processing and Illegitimate Integration of Non-Integrating Lentiviral Vectors [O] . Aaron M. Shaw, Guiandre L. Joseph, Aparna C. Jasti, -1

机译：载体基因组处理和非整合慢病毒载体非法整合的差异
7. Tuning a Finite Difference Computation for Parallel Vector Processors [O] . Gerhard Zumbusch 2012

机译：调整并行矢量处理器的有限差分计算
8. Parallelization of implicit finite difference schemes in computational fluid dynamics [R] . Decker, Naomi H., Naik, Vijay K., Nicoules, Michel 1990

机译：计算流体力学中隐式有限差分格式的并行化

Tuning a Finite Difference Computation for Parallel Vector Processors

摘要

著录项

相似文献

相关主题

期刊订阅