Regularising Fisher Information Improves Cross-lingual Generalisation

Asa Cooper Stickland, Iain Murray

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Many recent works use `consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and English-only. These works encourage model outputs to be similar between a perturbed and normal version of the input, usually via penalising the Kullback--Leibler (KL) divergence between the probability distribution of the perturbed and normal model. We believe that consistency losses may be implicitly regularizing the loss landscape. In particular, we build on work hypothesising that implicitly or explicitly regularizing trace of the Fisher Information Matrix (FIM), amplifies the implicit bias of SGD to avoid memorization. Our initial results show both empirically and theoretically that consistency losses are related to the FIM, and show that the flat minima implied by a small trace of the FIM improves performance when fine-tuning a multilingual model on additional languages. We aim to confirm these initial results on more datasets, and use our insights to develop better multilingual fine-tuning techniques.
Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Multilingual Representation Learning
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Pages238-241
Number of pages4
ISBN (Electronic)978-1-954085-96-1
Publication statusPublished - 11 Nov 2021
Event1st Workshop on Multilingual Representation Learning - Punta Cana, Dominican Republic
Duration: 11 Nov 202111 Nov 2021
https://sites.google.com/view/mrl-2021/home?authuser=0

Workshop

Workshop1st Workshop on Multilingual Representation Learning
Abbreviated titleMRL 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period11/11/2111/11/21
Internet address

Fingerprint

Dive into the research topics of 'Regularising Fisher Information Improves Cross-lingual Generalisation'. Together they form a unique fingerprint.

Cite this