Edinburgh Research Explorer

Automatic Generation of Specialized Direct Convolutions for Mobile GPUs

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

https://dl.acm.org/doi/abs/10.1145/3366428.3380771
Original languageEnglish
Title of host publicationGPGPU '20: Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit
PublisherACM
Pages41-50
Number of pages10
ISBN (Print)9781450370257
DOIs
Publication statusPublished - 23 Feb 2020
Event13th Workshop on General Purpose Processing Using GPU (GPGPU 2020) : @ PPoPP 2020 - San Diego, United States
Duration: 23 Feb 202023 Feb 2020
https://insight-archlab.github.io/gpgpu.html

Workshop

Workshop13th Workshop on General Purpose Processing Using GPU (GPGPU 2020)
Abbreviated titleGPGPU 2020
CountryUnited States
CitySan Diego
Period23/02/2023/02/20
Internet address

Abstract

Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications. Most GPU hardware vendors provide highly tuned libraries for CNNs such as Nvidia’s cuDNN or ARM Compute Library. Such libraries are the basis for higher-level, commonly-used, machine-learning frameworks such as PyTorch or Caffe, abstracting them away from vendor-specific implementation details. However, writing optimized parallel code for GPUs is far from trivial. This places a significant burden on hardware-specific library writers which have to continually play catch-up with rapid hardware and network evolution.

To reduce effort and reduce time to market, new approaches are needed based on automatic code generation, rather than manual implementation. This paper describes such an approach for direct convolutions using Lift, a new data-parallel intermediate language and compiler. Lift uses a high-level intermediate language to express algorithms which are then automatically optimized using a system of rewrite-rules. Direct convolution, as opposed to the matrix multiplication approach used commonly by machine-learning frameworks, uses an order of magnitude less memory, which is critical for mobile devices. Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized ARM Compute Library on the latest generation of ARM Mali GPU.

    Research areas

  • code generation, convolution, mobile GPU, parallelism

Event

13th Workshop on General Purpose Processing Using GPU (GPGPU 2020) : @ PPoPP 2020

23/02/2023/02/20

San Diego, United States

Event: Workshop

Download statistics

No data available

ID: 133591456