Lattice Boltzmann for Large-Scale GPU Systems

Alan Gray, Alistair Hart, Alan Richardson, Kevin Stratford

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the enablement of the Ludwig lattice Boltzmann parallel fluid dynamics application, designed specifically for complex problems, for massively parallel GPU-accelerated architectures. NVIDIA CUDA is introduced into the existing C/MPI framework, and we have given careful consideration to maintainability in addition to performance. Significant performance gains are realised on each GPU through restructuring of the data layout to allow memory coalescing and the adaptation of key loops to reduce off-chip memory accesses. The halo-swap communication phase has been designed to efficiently utilise many GPUs in parallel: included is the overlapping of several stages using CUDA stream functionality. The new GPU adaptation is seen to retain the good scaling behaviour of the original CPU code, and scales well up to 256 NVIDIA Fermi GPUs (the largest resource tested). The performance on the NVIDIA Fermi GPU is observed to be up to a factor of 4 greater than the (12-core) AMD Magny-Cours CPU (with all cores utilised) for a binary fluid benchmark.

Original languageEnglish
Title of host publicationAPPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING
EditorsK DeBosschere, EH DHollander, GR Joubert, D Padua, F Peters
Place of PublicationAMSTERDAM
PublisherIOS Press
Pages167-174
Number of pages8
ISBN (Print)978-1-61499-040-6
DOIs
Publication statusPublished - 2012
Event14th Biennial ParCo Conference (ParCo) - Ghent, Belgium
Duration: 31 Aug 20113 Sep 2011

Publication series

NameAdvances in Parallel Computing
PublisherIOS PRESS
Volume22
ISSN (Print)0927-5452

Conference

Conference14th Biennial ParCo Conference (ParCo)
CountryBelgium
Period31/08/113/09/11

Keywords

  • GPU
  • Lattice Boltzmann
  • CUDA
  • MPI
  • Optimisation
  • FLUIDS

Cite this