Abstract
OpenCL has been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely limits performance portability of OpenCL programs. Programmers need to manually tune applications for each specific device, preventing effective portability. We target a compiler transformation specific for data-parallel languages: thread-coarsening and show it can improve performance across different GPU devices. We then address the problem of selecting the best value for the coarsening factor parameter, i.e., deciding how many threads to merge together. We experimentally show that this is a hard problem to solve: good configurations are difficult to find and naive coarsening in fact leads to substantial slowdowns. We propose a solution based on a machine-learning model that predicts the best coarsening factor using kernel-function static features. The model automatically specializes to the different architectures considered. We evaluate our approach on 17 benchmarks on four devices: two Nvidia GPUs and two different generations of AMD GPUs. Using our technique, we achieve speedups between 1.11X and 1.33X on average.
Original language | English |
---|---|
Title of host publication | PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation |
Place of Publication | New York, NY, USA |
Publisher | ACM |
Pages | 455-466 |
Number of pages | 12 |
ISBN (Print) | 978-1-4503-2809-8 |
DOIs | |
Publication status | Published - 24 Aug 2014 |
Event | The 23rd International Conference on Parallel Architectures and Compilation Techniques - Edmonton, Canada Duration: 21 Feb 2014 → 25 Feb 2014 http://pact2014.pactconf.org/ |
Publication series
Name | International Conference on Parallel Architectures and Compilation Techniques |
---|---|
Publisher | ACM |
ISSN (Print) | 1089-795X |
Conference
Conference | The 23rd International Conference on Parallel Architectures and Compilation Techniques |
---|---|
Abbreviated title | PACT 2014 |
Country/Territory | Canada |
City | Edmonton |
Period | 21/02/14 → 25/02/14 |
Internet address |
Keywords
- opencl, optimization
Fingerprint
Dive into the research topics of 'Automatic Optimization of Thread-coarsening for Graphics Processors'. Together they form a unique fingerprint.Profiles
-
Michael O'Boyle
- School of Informatics - Personal Chair in Computer Science
- Institute for Computing Systems Architecture
- Computer Systems
Person: Academic: Research Active