PARTANS: An autotuning framework for stencil computation on multi-GPU systems

Thibaut Lutz, Christian Fensch, Murray Cole

Research output: Contribution to journalArticlepeer-review

Abstract

GPGPUs are a powerful and energy-efficient solution for many problems. For higher performance or larger problems, it is necessary to distribute the problem across multiple GPUs, increasing the already high programming complexity.

In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation. We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout. This adds nonuniform characteristics to a seemingly homogeneous setup, causing up to 23% performance loss. We address this issue with an autotuner that optimizes the distribution across multiple GPUs.
Original languageEnglish
Article number59
Number of pages24
JournalACM Transactions on Architecture and Code Optimization
Volume9
Issue number4
DOIs
Publication statusPublished - Jan 2013

Fingerprint Dive into the research topics of 'PARTANS: An autotuning framework for stencil computation on multi-GPU systems'. Together they form a unique fingerprint.

Cite this