Few-shot learning through contextual data augmentation

Farid Artaud, Rachel Bawden, Alexandra Birch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine translation (MT) models used in industries with constantly changing topics, such as translation or news agencies, need to adapt to new data to maintain their performance over time. Our aim is to teach a pre-trained MT model to translate previously unseen words accurately, based on very few examples. We propose (i) an experimental setup allowing us to simulate novel vocabulary appearing in human-submitted translations, and (ii) corresponding evaluation metrics to compare our approaches. We extend a data augmentation approach using a pretrained language model to create training examples with similar contexts for novel words. We compare different fine-tuning and data augmentation approaches and show that adaptation on the scale of one to five examples is possible. Combining data augmentation with randomly selected training sentences leads to the highest BLEU score and accuracy improvements. Impressively, with only 1 to 5 examples, our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
Original languageEnglish
Title of host publicationProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
PublisherAssociation for Computational Linguistics (ACL)
Pages1049-1062
Number of pages14
ISBN (Print)978-1-954085-02-2
Publication statusPublished - 19 Apr 2021
Event16th conference of the European Chapter of the Association for Computational Linguistics - Virtual Conference
Duration: 19 Apr 202123 Apr 2021
https://2021.eacl.org/

Conference

Conference16th conference of the European Chapter of the Association for Computational Linguistics
Abbreviated titleEACL 2021
CityVirtual Conference
Period19/04/2123/04/21
Internet address

Fingerprint

Dive into the research topics of 'Few-shot learning through contextual data augmentation'. Together they form a unique fingerprint.

Cite this