Helium: a transparent inter-kernel optimizer for OpenCL

Thibaut Lutz, Christian Fensch, Murray Cole

Research output: Chapter in Book/Report/Conference proceedingConference contribution


State of the art automatic optimization of OpenCL applications focuses on improving the performance of individual compute kernels. Programmers address opportunities for inter-kernel optimization in specific applications by ad-hoc hand tuning: manually fusing kernels together. However, the complexity of interactions between host and kernel code makes this approach weak or even unviable for applications involving more than a small number of kernel invocations or a highly dynamic control flow, leaving substantial potential opportunities unexplored. It also leads to an over complex, hard to maintain code base. We present Helium, a transparent OpenCL overlay which discovers, manipulates and exploits opportunities for inter-and intra-kernel optimization. Helium is implemented as preloaded library and uses a delay-optimize-replay mechanism in which kernel calls are intercepted, collectively optimized, and then executed according to an improved execution plan. This allows us to benefit from composite optimizations, on large, dynamically complex applications, with no impact on the code base. Our results show that Helium obtains at least the same, and frequently even better performance, than carefully handtuned code. Helium outperforms hand-optimized code where the exact dynamic composition of compute kernel cannot be known statically. In these cases, we demonstrate speedups of up to 3x over unoptimized code and an average speedup of 1.4x over hand optimized code.
Original languageEnglish
Title of host publicationGPGPU 2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs
Number of pages11
ISBN (Print)978-1-4503-3407-5
Publication statusPublished - 2015


Dive into the research topics of 'Helium: a transparent inter-kernel optimizer for OpenCL'. Together they form a unique fingerprint.

Cite this