In this paper, we consider sparse Cholesky factorization on a multiprocessor system that possesses a globally shared memory. Our algorithm is a parallel version of a serial blocked left-looking factorization algorithm. Unlike previous parallel left-looking algorithms, the new algorithm uses a matrix-matrix multiplication operation to implement its computationally intensive primitives. Consequently, careful implementation of these primitives enables extensive reuse of data in cache for most realistic problems, and thus reduces the volume of traffic to and from main memory. Reducing memory traffic is crucial on many shared-memory multiprocessors because the interconnect to main memory is often a serious bottleneck. We shall compare the performance of the parallel blocked algorithm with an earlier parallel left-looking algorithm studied.
展开▼