Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, Jose Cano, Elliot Crowley, Bjoern Franke, Amos Storkey, Michael O'Boyle

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2× slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3× with cuDNN and above 10× with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.
Original languageEnglish
Title of host publication2019 IEEE International Symposium on Workload Characterization (IISWC)
Place of PublicationOrlando, FL, USA
PublisherInstitute of Electrical and Electronics Engineers
Pages24-34
Number of pages11
ISBN (Electronic)978-1-7281-4045-2
ISBN (Print)978-1-7281-4046-9
DOIs
Publication statusPublished - 19 Mar 2020
Event2019 IEEE International Symposium on Workload Characterization - Orlando, United States
Duration: 3 Nov 20195 Nov 2019
http://www.iiswc.org/iiswc2019/index.html

Conference

Conference2019 IEEE International Symposium on Workload Characterization
Abbreviated titleIISWC-2019
Country/TerritoryUnited States
CityOrlando
Period3/11/195/11/19
Internet address

Keywords / Materials (for Non-textual outputs)

  • convolutional neural networks
  • channel pruning
  • embedded GPU

Fingerprint

Dive into the research topics of 'Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs'. Together they form a unique fingerprint.

Cite this