Automatic Optimization of Thread-coarsening for Graphics Processors

Alberto Magni, Christophe Dubach, Michael O'Boyle

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

OpenCL has been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely limits performance portability of OpenCL programs. Programmers need to manually tune applications for each specific device, preventing effective portability. We target a compiler transformation specific for data-parallel languages: thread-coarsening and show it can improve performance across different GPU devices. We then address the problem of selecting the best value for the coarsening factor parameter, i.e., deciding how many threads to merge together. We experimentally show that this is a hard problem to solve: good configurations are difficult to find and naive coarsening in fact leads to substantial slowdowns. We propose a solution based on a machine-learning model that predicts the best coarsening factor using kernel-function static features. The model automatically specializes to the different architectures considered. We evaluate our approach on 17 benchmarks on four devices: two Nvidia GPUs and two different generations of AMD GPUs. Using our technique, we achieve speedups between 1.11X and 1.33X on average.
Original languageEnglish
Title of host publicationPACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation
Place of PublicationNew York, NY, USA
Number of pages12
ISBN (Print)978-1-4503-2809-8
Publication statusPublished - 24 Aug 2014
EventThe 23rd International Conference on Parallel Architectures and Compilation Techniques - Edmonton, Canada
Duration: 21 Feb 201425 Feb 2014

Publication series

NameInternational Conference on Parallel Architectures and Compilation Techniques
ISSN (Print)1089-795X


ConferenceThe 23rd International Conference on Parallel Architectures and Compilation Techniques
Abbreviated titlePACT 2014
Internet address

Keywords / Materials (for Non-textual outputs)

  • opencl, optimization


Dive into the research topics of 'Automatic Optimization of Thread-coarsening for Graphics Processors'. Together they form a unique fingerprint.

Cite this