Automatic Generation of Specialized Direct Convolutions for Mobile GPUs

Naums Mogers, Valentin Radu, Lu Li, Jack Turner, Michael O'Boyle, Christophe Dubach

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications. Most GPU hardware vendors provide highly tuned libraries for CNNs such as Nvidia’s cuDNN or ARM Compute Library. Such libraries are the basis for higher-level, commonly-used, machine-learning frameworks such as PyTorch or Caffe, abstracting them away from vendor-specific implementation details. However, writing optimized parallel code for GPUs is far from trivial. This places a significant burden on hardware-specific library writers which have to continually play catch-up with rapid hardware and network evolution.

To reduce effort and reduce time to market, new approaches are needed based on automatic code generation, rather than manual implementation. This paper describes such an approach for direct convolutions using Lift, a new data-parallel intermediate language and compiler. Lift uses a high-level intermediate language to express algorithms which are then automatically optimized using a system of rewrite-rules. Direct convolution, as opposed to the matrix multiplication approach used commonly by machine-learning frameworks, uses an order of magnitude less memory, which is critical for mobile devices. Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized ARM Compute Library on the latest generation of ARM Mali GPU.
Original languageEnglish
Title of host publicationGPGPU '20: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit
Number of pages10
ISBN (Print)9781450370257
Publication statusPublished - 23 Feb 2020
Event13th Workshop on General Purpose Processing Using GPU (GPGPU 2020) : @ PPoPP 2020 - San Diego, United States
Duration: 23 Feb 202023 Feb 2020


Workshop13th Workshop on General Purpose Processing Using GPU (GPGPU 2020)
Abbreviated titleGPGPU 2020
Country/TerritoryUnited States
CitySan Diego
Internet address


  • code generation
  • convolution
  • mobile GPU
  • parallelism


Dive into the research topics of 'Automatic Generation of Specialized Direct Convolutions for Mobile GPUs'. Together they form a unique fingerprint.

Cite this