【24h】

An Evaluation of Threaded Models for a Classical MD Proxy Application

机译:经典MD代理应用程序的线程模型评估

获取原文
获取原文并翻译 | 示例

摘要

Exascale systems will have many-core nodes, less memory capacity per core than today's systems, and a large degree of performance variability between cores. All these conditions challenge bulk synchronous SPMD models in which execution is typically synchronous and communication is based on buffers and ghost regions.We explore the design of a multithreaded MD code to evaluate several tradeoffs that arise when converting an MPI application into a hybrid multithreaded application, to address the aforementioned constraints of future architectures.Using OpenMP and PThreads, we implemented several variants of CoMD, a molecular dynamics proxy application. We found that in CoMD, duplicating some of the work to avoid race conditions is an easier and more scalable solution than using atomic updates; that data allocation and placement can be controlled to some extent with a hybrid MPI+threads approach, though an explicit NUMA API to control locality may be desirable; and finally that dynamically scheduling the work within a process can mitigate the impact of performance variability among cores and preserve most of the performance, especially when compared to bulk synchronous implementations such as the MPI reference.
机译:Exascale系统将具有多核节点,与当今的系统相比,每个核的内存容量更少,并且核之间的性能差异很大。所有这些条件都对批量同步SPMD模型提出了挑战,在这些模型中,执行通常是同步的,通信基于缓冲区和虚区。我们探索了多线程MD代码的设计,以评估将MPI应用程序转换为混合多线程应用程序时出现的一些折衷,为了解决上述对未来体系结构的限制。我们使用OpenMP和PThreads实现了分子动力学代理应用CoMD的多种变体。我们发现在CoMD中,与使用原子更新相比,复制某些工作来避免竞争条件是一种更容易且可扩展的解决方案。尽管可能需要使用明确的NUMA API来控制位置,但是可以使用MPI + threads混合方法在某种程度上控制数据分配和放置;最后,动态调度流程中的工作可以减轻内核之间性能差异的影响并保留大多数性能,特别是与批量同步实现(例如MPI参考)相比时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号