Abstract / Description of output
The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.
Original language | English |
---|---|
Title of host publication | Proceedings of the Seventh Conference on Machine Translation |
Editors | Philipp Koehn, Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves, Martin Popel, Marco Turchi, Marcos Zampieri |
Place of Publication | Abu Dhabi, United Arab Emirates |
Publisher | Association for Computational Linguistics |
Pages | 1145-1157 |
Number of pages | 13 |
ISBN (Electronic) | 9781959429296 |
Publication status | Published - 1 Dec 2022 |
Event | Seventh Conference on Machine Translation - Abu Dhabi, United Arab Emirates Duration: 7 Dec 2022 → 8 Dec 2022 Conference number: 7 https://statmt.org/wmt22/ |
Publication series
Name | Proceedings of the Conference on Machine Translation |
---|---|
Publisher | ACL |
ISSN (Electronic) | 2768-0983 |
Conference
Conference | Seventh Conference on Machine Translation |
---|---|
Abbreviated title | WMT22 |
Country/Territory | United Arab Emirates |
City | Abu Dhabi |
Period | 7/12/22 → 8/12/22 |
Internet address |