TY - JOUR
T1 - A Lightweight Approach to Performance Portability with targetDP
AU - Gray, Alan
AU - Stratford, Kevin
N1 - 11 pages, 5 figures, accepted to the International Journal of High Performance Computing Applications (IJHPCA), acceptance date 27th October 2016
PY - 2018/3/1
Y1 - 2018/3/1
N2 - Leading HPC systems achieve their status through use of highly parallel devices such as NVIDIA GPUs or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus a separate lattice QCD particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with MPI to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and GPU-accelerated large scale supercomputers.
AB - Leading HPC systems achieve their status through use of highly parallel devices such as NVIDIA GPUs or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus a separate lattice QCD particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with MPI to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and GPU-accelerated large scale supercomputers.
U2 - 10.1177/1094342016682071
DO - 10.1177/1094342016682071
M3 - Article
SN - 1094-3420
SP - 288
EP - 301
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
ER -