Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Tobias Grosser, Albert Cohen, Paul HJ Kelly, J Ramanujam, P Sadayappan, Sven Verdoolaege

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Tiling is a key technique to enhance data reuse. For computations structured as one sequential outer "time" loop enclosing a set of parallel inner loops, tiling only the parallel inner loops may not enable enough data reuse in the cache. Tiling the inner loops along with the outer time loop enhances data locality but may require other transformations like loop skewing that inhibit inter-tile parallelism.

One approach to tiling that enhances data locality without inhibiting inter-tile parallelism is split tiling, where tiles are subdivided into a sequence of trapezoidal computation steps. In this paper, we develop an approach to generate split tiled code for GPUs in the PPCG polyhedral code generator. We propose a generic algorithm to calculate index-set splitting that enables us to perform tiling for locality and synchronization avoidance, while simultaneously maintaining parallelism, without the need for skewing or redundant computations. Our algorithm performs split tiling for an arbitrary number of dimensions and without the need to construct any large integer linear program. The method and its implementation are evaluated on standard stencil kernels and compared with a state-of-the-art polyhedral compiler and with a domain-specific stencil compiler, both targeting CUDA GPUs.
Original languageEnglish
Title of host publicationProceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
PublisherACM Association for Computing Machinery
Pages24-31
Number of pages8
ISBN (Electronic)9781450320177
DOIs
Publication statusPublished - 16 Mar 2013
Event6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013 - Houston, TX, United States
Duration: 16 Mar 201316 Mar 2013

Conference

Conference6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013
CountryUnited States
CityHouston, TX
Period16/03/1316/03/13

Fingerprint

Dive into the research topics of 'Split tiling for GPUs: automatic parallelization using trapezoidal tiles'. Together they form a unique fingerprint.

Cite this