...
首页> 外文期刊>Big Data Research >Applying Parallel Computing Techniques to Analyze Terabyte Atmospheric Boundary Layer Model Outputs
【24h】

Applying Parallel Computing Techniques to Analyze Terabyte Atmospheric Boundary Layer Model Outputs

机译:应用并行计算技术分析Tberabyte大气边界层模型输出

获取原文
获取原文并翻译 | 示例
           

摘要

In the atmospheric sciences, the size of simulation output continues to grow as computational resources able to handle simulations with fine-scale spatial and temporal resolutions become more accessible. As output size increases, serial data analysis methods become overwhelmed, resulting in either long delays during processing or total failures due to memory constraints. Parallel data analysis methods can alleviate these issues, however atmospheric scientists are often unfamiliar with how to achieve this. Therefore, example methods are needed to help guide the use of parallel processing in the analysis of Big Data from atmospheric simulations. In this work, practical methods are presented by which an analysis may be executed in parallel using the Message Passing Interface (MPI) and Python. These methods first consider the inherent spatial dependencies of a particular data analysis process. By identifying these dependencies, horizontal or vertical distribution of the dataset across processes can be carried out with minimal process intercommunication. In addition, an analysis method is classified as either data-transfer-limited or computationally-limited. In data-transfer-limited problems, data transfer time outweighs processing time. In computationally-limited problems, processing time outweighs data transfer time. The results show that by increasing processor count, the execution time of computationally-limited problems shows improvement. For data-transfer-limited problems, increasing node count offers the greatest improvement. To further improve the performance of computationally-limited problems, a Graphics Processing Unit (GPU) and the Compute Unified Device Architecture (CUDA) framework are used. It is shown that this GPU implementation offers further improvement over the MPI version of the analysis methods tested.
机译:在大气科学中,模拟输出的大小继续增长,因为能够使用微尺度空间和时间分辨率来处理模拟的计算资源变得更加可访问。随着输出大小的增加,串行数据分析方法变得不堪重负,导致处理期间的长延迟或由于内存约束而导致的总故障。并行数据分析方法可以缓解这些问题,然而大气科学家通常不熟悉如何实现这一目标。因此,需要示例方法来帮助指导在大气模拟的大数据分析中使用并行处理的使用。在这项工作中,提出了实际方法,通过该方法可以使用消息传递接口(MPI)和Python并行执行分析。这些方法首先考虑特定数据分析过程的固有空间依赖性。通过识别这些依赖关系,可以通过最小的过程互通来执行跨处理的数据集的水平或垂直分布。另外,分析方法被归类为数据传输限制或计算限制。在数据传输限制的问题中,数据传输时间超过处理时间。在计算限制的问题中,处理时间超过数据传输时间。结果表明,通过增加处理器计数,计算限制问题的执行时间显示出改善。对于数据传输限制问题,增加节点计数提供了最大的改进。为了进一步提高计算限制问题的性能,使用图形处理单元(GPU)和计算统一设备架构(CUDA)框架。结果表明,此GPU实施提供了通过测试的分析方法的MPI版本的进一步改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号