Abstract / Description of output
Many recent works use `consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and English-only. These works encourage model outputs to be similar between a perturbed and normal version of the input, usually via penalising the Kullback--Leibler (KL) divergence between the probability distribution of the perturbed and normal model. We believe that consistency losses may be implicitly regularizing the loss landscape. In particular, we build on work hypothesising that implicitly or explicitly regularizing trace of the Fisher Information Matrix (FIM), amplifies the implicit bias of SGD to avoid memorization. Our initial results show both empirically and theoretically that consistency losses are related to the FIM, and show that the flat minima implied by a small trace of the FIM improves performance when fine-tuning a multilingual model on additional languages. We aim to confirm these initial results on more datasets, and use our insights to develop better multilingual fine-tuning techniques.
Original language | English |
---|---|
Title of host publication | Proceedings of the 1st Workshop on Multilingual Representation Learning |
Place of Publication | Punta Cana, Dominican Republic |
Publisher | Association for Computational Linguistics |
Pages | 238-241 |
Number of pages | 4 |
ISBN (Electronic) | 978-1-954085-96-1 |
Publication status | Published - 11 Nov 2021 |
Event | 1st Workshop on Multilingual Representation Learning - Punta Cana, Dominican Republic Duration: 11 Nov 2021 → 11 Nov 2021 https://sites.google.com/view/mrl-2021/home?authuser=0 |
Workshop
Workshop | 1st Workshop on Multilingual Representation Learning |
---|---|
Abbreviated title | MRL 2021 |
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 11/11/21 → 11/11/21 |
Internet address |