Scaling soft matter physics to thousands of graphics processing units in parallel

A. Gray, A. Hart, O. Henrich, K. Stratford

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

We describe a multi-graphics processing unit (GPU) implementation of the Ludwig application, which specialises in simulating a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original central processing unit (CPU) version with GPU functionality in a maintainable fashion. We present several optimisations that maximise performance on the GPU architecture through tuning for the GPU memory hierarchy. We describe how we implement particles within the fluid in such a way to avoid a major diversion of the CPU and GPU codebases, whilst minimising data transfer at each time step. We detail our halo-exchange communication phase for the code, which exploits overlapping to allow efficient parallel scaling to many GPUs. We present results showing that the application demonstrates excellent scaling to at least 8192 GPUs in parallel, the largest system tested at the time of writing. The GPU version (on NVIDIA K20X GPUs) is around 3.5–5 times faster that the CPU version (on fully utilised AMD Opteron 6274 16-core CPUs), comparing equal numbers of CPUs and GPUs.
Original languageEnglish
Pages (from-to)274-283
JournalInternational Journal of High Performance Computing Applications
Issue number3
Publication statusE-pub ahead of print - 25 Mar 2015


Dive into the research topics of 'Scaling soft matter physics to thousands of graphics processing units in parallel'. Together they form a unique fingerprint.

Cite this