Edinburgh Research Explorer

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

https://arxiv.org/abs/1902.02671
Original languageEnglish
Title of host publicationProceedings of the 36th International Conference on Machine Learning (ICML)
Place of PublicationLong Beach, USA
PublisherPMLR
Number of pages12
StateAccepted/In press - 21 Apr 2019
EventThirty-sixth International Conference on Machine Learning - Long Beach, United States
Duration: 9 Jun 201915 Jun 2019
https://icml.cc/

Publication series

NamePMLR
Volume97

Conference

ConferenceThirty-sixth International Conference on Machine Learning
Abbreviated titleICML 2019
CountryUnited States
CityLong Beach
Period9/06/1915/06/19
Internet address

Abstract

Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately finetuned models on the GLUE benchmark with ≈7 times fewer parameters, and obtain state-of-theart results on the Recognizing Textual Entailment dataset.

Event

Thirty-sixth International Conference on Machine Learning

9/06/1915/06/19

Long Beach, United States

Event: Conference

ID: 88887136