TY - UNPB
T1 - The Promises of Hybrid Hexagonal/Classical Tiling for GPU
AU - Grosser, Tobias
AU - Verdoolaege, Sven
AU - Cohen, Albert
AU - Sadayappan, P.
PY - 2013/7/27
Y1 - 2013/7/27
N2 - Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil's spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased control flow overhead incompatible with efficient GPU execution. We explore a new path to maximize the effectivness of time-tiling on iterative stencil computations. Our approach is particularly well suited for GPUs. It does not require any redundant computations, it favors coalesced global-memory access and data reuse in shared-memory/cache, avoids thread divergence, and extracts a high degree of parallelism. We introduce hybrid hexagonal tiling, combining hexagonal tile shapes along the time (sequential) dimension and one spatial dimension, with classical tiling for other spatial dimensions. An hexagonal tile shape simultaneously enable parallel tile execution and reuse along the time dimension. Experimental results demonstrate significant performance improvements over existing stencil compilers.
AB - Time-tiling is necessary for efficient execution of iterative stencil computations. But the usual hyper-rectangular tiles cannot be used because of positive/negative dependence distances along the stencil's spatial dimensions. Several prior efforts have addressed this issue. However, known techniques trade enhanced data reuse for other causes of inefficiency, such as unbalanced parallelism, redundant computations, or increased control flow overhead incompatible with efficient GPU execution. We explore a new path to maximize the effectivness of time-tiling on iterative stencil computations. Our approach is particularly well suited for GPUs. It does not require any redundant computations, it favors coalesced global-memory access and data reuse in shared-memory/cache, avoids thread divergence, and extracts a high degree of parallelism. We introduce hybrid hexagonal tiling, combining hexagonal tile shapes along the time (sequential) dimension and one spatial dimension, with classical tiling for other spatial dimensions. An hexagonal tile shape simultaneously enable parallel tile execution and reuse along the time dimension. Experimental results demonstrate significant performance improvements over existing stencil compilers.
M3 - Working paper
BT - The Promises of Hybrid Hexagonal/Classical Tiling for GPU
PB - INRIA
ER -