Knowledge Distillation for Multi-task Learning

Weihong Li*, Hakan Bilen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation. Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics (e.g. cross-entropy, Euclidean loss), leading to the imbalance problem in multi-task learning. To address the imbalance problem, we propose a knowledge distillation based method in this work. We first learn a task-specific model for each task. We then learn the multi-task model for minimizing task-specific loss and for producing the same feature with task-specific models. As the task-specific network encodes different features, we introduce small task-specific adaptors to project multi-task features to the task-specific features. In this way, the adaptors align the task-specific feature and the multi-task feature, which enables a balanced parameter sharing across tasks. Extensive experimental results demonstrate that our method can optimize a multi-task learning model in a more balanced way and achieve better overall performance.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 Workshops
EditorsAdrien Bartoli, Andrea Fusiello
Number of pages14
ISBN (Electronic)978-3-030-65414-6
ISBN (Print)978-3-030-65413-9
Publication statusPublished - 5 Jan 2021
EventWorkshops held at the 16th European Conference on Computer Vision - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceWorkshops held at the 16th European Conference on Computer Vision
Abbreviated titleECCV 2020
CountryUnited Kingdom
Internet address


Dive into the research topics of 'Knowledge Distillation for Multi-task Learning'. Together they form a unique fingerprint.

Cite this