首页> 外文会议>International Symposium on Parallel and Distributed Computing >Tuning a Finite Difference Computation for Parallel Vector Processors
【24h】

Tuning a Finite Difference Computation for Parallel Vector Processors

机译:调整并行矢量处理器的有限差分计算

获取原文

摘要

Current CPU and GPU architectures heavily use data and instruction parallelism at different levels. Floating point operations are organised in vector instructions of increasing vector length. For reasons of performance it is mandatory to use the vector instructions efficiently. Several ways of tuning a model problem finite difference stencil computation are discussed. The combination of vectorisation and an interleaved data layout, cache aware algorithms, loop unrolling, parallelisation and parameter tuning lead to optimised implementations at a level of 90% peak performance of the floating point pipelines on recent Intel Sandy Bridge and AMD Bulldozer CPU cores, both with AVX vector instructions as well as on Nvidia Fermi/ Kepler GPU architectures. Furthermore, we present numbers for parallel multi-core/ multi-processor and multi-GPU configurations. They represent regularly more than an order of speed up compared to a standard implementation. The analysis may also explain deficiencies of automatic vectorisation for linear data layout and serve as a foundation of efficient implementations of more complex expressions.
机译:目前的CPU和GPU架构在不同级别中使用数据和指令并行性。浮点操作在增加矢量长度的矢量指令中组织。出于性能原因,必须有效地使用矢量指令。讨论了几种调整模型问题的方法有限差异模板计算。载体的组合和交错数据布局,缓存意识算法,环路展开,平行化和参数调谐导致浮点管道上最近的Intel Sandy Bridge和AMD推土机CPU核心的90%峰值性能的优化实现具有AVX矢量说明以及NVIDIA FERMI / KEPPLER GPU架构。此外,我们为并行多核/多处理器和多GPU配置提供数字。与标准实施相比,它们定期表示超过加速顺序。分析还可以解释用于线性数据布局的自动载体的缺陷,并作为更复杂的表达式的有效实现的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号