Edinburgh Research Explorer

Moonshine: Distilling with Cheap Convolutions

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationThirty-second Conference on Neural Information Processing Systems (NIPS 2018)
Place of PublicationMontreal, Canada
Number of pages11
Publication statusPublished - 2018
EventThirty-second Conference on Neural Information Processing Systems - Montreal, Canada
Duration: 3 Dec 20188 Dec 2018
https://nips.cc/

Conference

ConferenceThirty-second Conference on Neural Information Processing Systems
Abbreviated titleNIPS 2018
CountryCanada
CityMontreal
Period3/12/188/12/18
Internet address

Abstract

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting most-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

Event

Thirty-second Conference on Neural Information Processing Systems

3/12/188/12/18

Montreal, Canada

Event: Conference

Download statistics

No data available

ID: 75307801