BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately finetuned models on the GLUE benchmark with ≈7 times fewer parameters, and obtain state-of-theart results on the Recognizing Textual Entailment dataset.
Original languageEnglish
Title of host publicationProceedings of the 36th International Conference on Machine Learning (ICML)
EditorsKamalika Chaudhuri, Ruslan Salakhutdinov
Place of PublicationLong Beach, USA
Number of pages12
Publication statusE-pub ahead of print - 3 Jul 2019
EventThirty-sixth International Conference on Machine Learning - Long Beach Convention Center, Long Beach, United States
Duration: 9 Jun 201915 Jun 2019
Conference number: 36

Publication series

NameProceedings of Machine Learning Research
ISSN (Electronic)2640-3498


ConferenceThirty-sixth International Conference on Machine Learning
Abbreviated titleICML 2019
CountryUnited States
CityLong Beach
Internet address

Cite this