Abstract / Description of output
Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a single BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately finetuned models on the GLUE benchmark with ≈7 times fewer parameters, and obtain state-of-theart results on the Recognizing Textual Entailment dataset.
Original language | English |
---|---|
Title of host publication | Proceedings of the 36th International Conference on Machine Learning (ICML) |
Editors | Kamalika Chaudhuri, Ruslan Salakhutdinov |
Place of Publication | Long Beach, USA |
Publisher | PMLR |
Pages | 5986-5995 |
Number of pages | 12 |
Volume | 97 |
Publication status | E-pub ahead of print - 3 Jul 2019 |
Event | Thirty-sixth International Conference on Machine Learning - Long Beach Convention Center, Long Beach, United States Duration: 9 Jun 2019 → 15 Jun 2019 Conference number: 36 https://icml.cc/Conferences/2019 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Publisher | PMLR |
Volume | 97 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | Thirty-sixth International Conference on Machine Learning |
---|---|
Abbreviated title | ICML 2019 |
Country/Territory | United States |
City | Long Beach |
Period | 9/06/19 → 15/06/19 |
Internet address |