首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing;IPDPS 2009 >Minimizing startup costs for performance-critical threading
【24h】

Minimizing startup costs for performance-critical threading

机译:降低性能关键型线程的启动成本

获取原文

摘要

Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically scheduled parallel programs with minimal task interaction. Therefore, the widely held view that these thread management issues can be ignored in such computationally intensive libraries is wrong, and leads to substantial slowdown on today's machines. We survey several methods for reducing this overhead, the best of which we have not seen in the literature. Finally, we demonstrate that by applying these techniques at the kernel level, performance in applications such as LU and QR factorizations can be improved by almost 40% for small problems, and as much as 15% for large O(N3) computations. These techniques are completely general, and should yield significant speedup in almost any performance-critical operation.We then show that the lion's share of the remaining parallel inefficiency comes from bus contention, and, in the future work section, outline some promising avenues for further improvement.
机译:使用著名的ATLAS和LAPACK密集线性代数库,我们证明了即使在静态调度的并行程序上,任务管理的交互作用最少,并行管理开销(PMO)也会随着问题的大小而增长。因此,人们普遍认为在这样的计算密集型库中可以忽略这些线程管理问题是错误的,并导致当今计算机的速度显着下降。我们研究了减少这种开销的几种方法,其中最好的方法是我们在文献中没有看到的。最后,我们证明了通过在内核级别应用这些技术,对于小问题,LU和QR分解等应用程序的性能可以提高近40%,对于大O(N 3 < / sup>)计算。这些技术是完全通用的,并且在几乎所有对性能至关重要的操作中都应能显着提高速度。然后我们证明剩余的并行低效率中的绝大部分来自总线争用,并且在未来的工作部分中,概述了一些有希望进一步发展的途径改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号