Edinburgh Research Explorer

Automatic Generation of Specialized Direct Convolutions for Mobile GPUs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions



Original languageEnglish
Title of host publicationGPGPU '20: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit
Number of pages10
ISBN (Print)9781450370257
Publication statusPublished - 23 Feb 2020
Event13th Workshop on General Purpose Processing Using GPU (GPGPU 2020) : @ PPoPP 2020 - San Diego, United States
Duration: 23 Feb 202023 Feb 2020


Workshop13th Workshop on General Purpose Processing Using GPU (GPGPU 2020)
Abbreviated titleGPGPU 2020
CountryUnited States
CitySan Diego
Internet address


Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications. Most GPU hardware vendors provide highly tuned libraries for CNNs such as Nvidia’s cuDNN or ARM Compute Library. Such libraries are the basis for higher-level, commonly-used, machine-learning frameworks such as PyTorch or Caffe, abstracting them away from vendor-specific implementation details. However, writing optimized parallel code for GPUs is far from trivial. This places a significant burden on hardware-specific library writers which have to continually play catch-up with rapid hardware and network evolution.

To reduce effort and reduce time to market, new approaches are needed based on automatic code generation, rather than manual implementation. This paper describes such an approach for direct convolutions using Lift, a new data-parallel intermediate language and compiler. Lift uses a high-level intermediate language to express algorithms which are then automatically optimized using a system of rewrite-rules. Direct convolution, as opposed to the matrix multiplication approach used commonly by machine-learning frameworks, uses an order of magnitude less memory, which is critical for mobile devices. Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized ARM Compute Library on the latest generation of ARM Mali GPU.

    Research areas

  • code generation, convolution, mobile GPU, parallelism


13th Workshop on General Purpose Processing Using GPU (GPGPU 2020) : @ PPoPP 2020


San Diego, United States

Event: Workshop

Download statistics

No data available

ID: 133591456