首页> 外文期刊>Concurrency and computation: practice and experience >Design of self-adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution
【24h】

Design of self-adaptable data parallel applications on multicore clusters automatically optimized for performance and energy through load distribution

机译:通过负载分配自动优化性能和能源的多核群集上的自适应数据并行应用程序设计

获取原文
获取原文并翻译 | 示例
           

摘要

Self-adaptability is a highly preferred feature in HPC applications. A crucial building block of a self-adaptable application is a data partitioning algorithm that must possess several essential qualities apart from low runtime and memory costs. On modern platforms composed of multicore CPU processors, data partitioning algorithms striving to solve the bi-objective optimization problem for performance and energy (BOPPE) face a formidable challenge. They must take into account the newcomplexities inherent in these platforms such as severe resource contention and non-uniform memory access (NUMA). Novel model-based methods and data partitioning algorithms have been proposed that address the challenge.However, these methods take as input full functional performance and energy models (FPM and FEM), which have prohibitively highmodel construction costs. Therefore, they are not suitable for employment in self-adaptable applications. In this paper, we present a self-adaptable data partitioning algorithm called ADAPTALEPH, which solves BOPPE on homogeneous clusters of multicore CPUs. Unlike the state-of-the-art solving BOPPE that take as inputs full FPM and FEM, it constructs partial FPM and FEM during its execution using all the available processors. It returns a locally Pareto-optimal set of solutions, which are the heterogeneous workload distributions that achieve inter-node optimization of data-parallel applications for performance and energy.We experimentally study the efficiency of ADAPTALEPH for three data-parallel applications, ie, matrix-vector multiplication, matrix-matrix multiplication, and fast Fourier transform, on a modern multicore CPU and simulations for homogeneous clusters of such CPUs.We demonstrate that the locally Pareto-optimal front approaches the globally Pareto-optimal front as the number of points in the partial discrete FPM and FEM functions are increased. The number of points in the partial FPM/FEM when the locally Pareto-optimal front becomes the globally Pareto-optimal front is considerably less than the number of points in the full FPM/FEM thereby suggesting development of methods that can leverage this finding to drastically reduce themodel construction times.
机译:自适应功能是HPC应用程序中的首选功能。自适应应用程序的一个重要组成部分是数据分区算法,该算法必须具有几种基本质量,而且运行时间和内存成本较低。在由多核CPU处理器组成的现代平台上,致力于解决性能和能源双目标优化问题(BOPPE)的数据分区算法面临着巨大的挑战。他们必须考虑这些平台固有的新复杂性,例如严重的资源争用和非均匀内存访问(NUMA)。已经提出了基于模型的新颖方法和数据划分算法来解决这一挑战,但是这些方法将完整的功能性能和能量模型(FPM和FEM)作为输入,而模型的建造成本却很高。因此,它们不适合在自适应应用程序中使用。在本文中,我们提出了一种称为ADAPTALEPH的自适应数据分区算法,该算法可解决多核CPU的同质集群上的BOPPE问题。不同于最新的解决方案BOPPE作为完整FPM和FEM的输入,它在执行过程中使用所有可用的处理器构造部分FPM和FEM。它返回一组局部帕累托最优的解决方案,这些解决方案是异构的工作负载分布,可实现数据并行应用程序的节点间优化以实现性能和能源。我们通过实验研究了ADAPTALEPH对于三种数据并行应用程序(即矩阵)的效率向量乘法,矩阵矩阵乘法和快速傅立叶变换,在现代多核CPU上进行,并且对此类CPU的同质集群进行了仿真。增加了部分离散FPM和FEM功能。当局部帕累托最优前沿成为全局帕累托最优前沿时,部分FPM / FEM中的点数大大少于完整FPM / FEM中的点数,从而建议开发可以充分利用这一发现的方法减少模型构建时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号