BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland, Iain Murray

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately finetuned models on the GLUE benchmark with ≈7 times fewer parameters, and obtain state-of-theart results on the Recognizing Textual Entailment dataset.
Original languageEnglish
Title of host publicationProceedings of the 36th International Conference on Machine Learning (ICML)
EditorsKamalika Chaudhuri, Ruslan Salakhutdinov
Place of PublicationLong Beach, USA
PublisherPMLR
Pages5986-5995
Number of pages12
Volume97
Publication statusE-pub ahead of print - 3 Jul 2019
EventThirty-sixth International Conference on Machine Learning - Long Beach Convention Center, Long Beach, United States
Duration: 9 Jun 201915 Jun 2019
Conference number: 36
https://icml.cc/Conferences/2019

Publication series

NameProceedings of Machine Learning Research
PublisherPMLR
Volume97
ISSN (Electronic)2640-3498

Conference

ConferenceThirty-sixth International Conference on Machine Learning
Abbreviated titleICML 2019
Country/TerritoryUnited States
CityLong Beach
Period9/06/1915/06/19
Internet address

Fingerprint

Dive into the research topics of 'BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning'. Together they form a unique fingerprint.

Cite this