Projects per year
Abstract
The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.
Original language | English |
---|---|
Title of host publication | Proceedings of the Fourth Conference on Machine Translation |
Subtitle of host publication | Volume 2: Shared Task Papers |
Place of Publication | Florence, Italy |
Publisher | Association for Computational Linguistics |
Pages | 302–314 |
Number of pages | 13 |
Volume | 2 |
DOIs | |
Publication status | Published - Aug 2019 |
Event | ACL 2019 Fourth Conference on Machine Translation - Florence, Italy Duration: 1 Aug 2019 → 2 Aug 2019 http://www.statmt.org/wmt19/ |
Conference
Conference | ACL 2019 Fourth Conference on Machine Translation |
---|---|
Abbreviated title | WMT19 |
Country/Territory | Italy |
City | Florence |
Period | 1/08/19 → 2/08/19 |
Internet address |
Keywords
- Machine translation
- Shared task
Fingerprint
Dive into the research topics of 'The University of Edinburgh’s Submissions to the WMT19 News Translation Task'. Together they form a unique fingerprint.Projects
- 4 Finished