The University of Edinburgh's submission to the WMT22 code-mixing shared task (MixMT)

Faheem Kirefu, Vivek Iyer, Pinzhen Chen, Laurie Burchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.
Original languageEnglish
Title of host publicationProceedings of the Seventh Conference on Machine Translation
EditorsPhilipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri
Place of PublicationAbu Dhabi, United Arab Emirates
PublisherAssociation for Computational Linguistics
Pages1145-1157
Number of pages13
ISBN (Electronic)9781959429296
Publication statusPublished - 1 Dec 2022
EventSeventh Conference on Machine Translation - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 20228 Dec 2022
Conference number: 7
https://statmt.org/wmt22/

Publication series

NameProceedings of the Conference on Machine Translation
PublisherACL
ISSN (Electronic)2768-0983

Conference

ConferenceSeventh Conference on Machine Translation
Abbreviated titleWMT22
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/228/12/22
Internet address

Fingerprint

Dive into the research topics of 'The University of Edinburgh's submission to the WMT22 code-mixing shared task (MixMT)'. Together they form a unique fingerprint.

Cite this