首页> 外文会议>ICA3PP 2014 >Porting the Princeton Ocean Model to GPUs
【24h】

Porting the Princeton Ocean Model to GPUs

机译:将普林斯顿海洋模型移植到GPUS

获取原文

摘要

While GPU is becoming a compelling acceleration solution for a series of scientific applications, most existing work on climate models only achieved limited speedup. It is due to partial porting of the huge code and the memory bound inherence of these models. In this work, we design and implement a customized GPU-based acceleration of the Princeton Ocean Model (gpuPOM). Based on Nvidia's state-of-the-art GPU architectures (K20X and K40m), we rewrite the original model from the Fortran into the CUDA-C completely. Several accelerating methods, including optimizing memory access in a single GPU, overlapping communication and boundary operations among multiple GPUs, are presented. The experimental results show that the gpuPOM on one K40m GPU achieves 6.9-fold to 17.8-fold speedup and 5.8-fold to 15.5-fold speedup on one K20X GPU comparing with different Intel CPUs. Further experiments on multiple GPUs indicate that the performance of the gpuPOM on a super-workstation containing 4 GPUs is equivalent to a powerful cluster consisting of 34 pure CPU nodes with over 400 CPU cores.
机译:虽然GPU正在成为一系列科学应用的令人信服的加速解决方案,但大多数现有的气候模型工作仅取得了有限的快速。它是由于庞大的代码和这些模型的内存绑定固有的部分移植。在这项工作中,我们设计并实施了普林斯顿海洋模型(GPUPOM)的定制基于GPU的加速度。基于NVIDIA的最先进的GPU架构(K20X和K40M),我们完全将原始模型从FORTRAN重写为CUDA-C.呈现了几种加速方法,包括在单个GPU中优化存储器访问,在多个GPU中的重叠通信和边界操作中。实验结果表明,一个K40M GPU上的GPUPOM达到6.9倍至17.8倍的加速度和与不同英特尔CPU的一个K20X GPU的加速5.8倍至15.5倍加速。多个GPU的进一步实验表明GPUPOM在包含4个GPU的超级工作站上的性能等同于由具有超过400个CPU内核的34个纯CPU节点组成的强大群集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号