\dm_csml_event_details UCL ELLIS

A hybrid Cholesky decomposition algorithm for multicore CPUs with GPU accelerators


Gary Macindoe




Friday, 08 February 2013






Cruciform B404 - LT2

Event series

DeepMind/ELLIS CSML Seminar Series


Use of the Cholesky decomposition appears throughout the field of computational statistics and is often the performance bottleneck of such algorithms. As the number of cores available in a processor increases algorithms need to be redesigned to extract performance by running operations in parallel rather than relying on an increase in clock speeds. In addition, graphics processing units are capable of executing tens of thousands of operations in parallel and are no longer restricted to graphical calculations.

We have developed a Cholesky decomposition algorithm for multi-core CPUs and GPUs. We introduce a new method of copying submatrices and use it to have the GPU and CPU calculate the matrix in parallel. We add a new level of dynamic blocking that matches the workload to the compute device at each iteration and also exploit the differences between SIMD and SIMT programming to have multiple functions execute simultaneously on older classes of GPU that do not have this capability built into the hardware.

Our methods are generally applicable to blocked algorithms for linear algebra such as those in the LAPACK library.

Slides for the talk: PDF