Edinburgh Research Explorer

The University of Edinburgh’s Submissions to the WMT19 News Translation Task

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

https://arxiv.org/abs/1907.05854
Original languageEnglish
Title of host publicationProceedings of the Fourth Conference on Machine Translation
Subtitle of host publicationVolume 2: Shared Task Papers
Place of PublicationFlorence, Italy
PublisherAssociation for Computational Linguistics
Pages302–314
Number of pages13
Volume2
Publication statusPublished - Aug 2019
EventACL 2019 Fourth Conference on Machine Translation - Florence, Italy
Duration: 1 Aug 20192 Aug 2019
http://www.statmt.org/wmt19/

Conference

ConferenceACL 2019 Fourth Conference on Machine Translation
Abbreviated titleWMT19
CountryItaly
CityFlorence
Period1/08/192/08/19
Internet address

Abstract

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German-to-English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English-to-Czech, we compared different pre-processing and tokenisation regimes.

    Research areas

  • Machine translation, Shared task

Event

ACL 2019 Fourth Conference on Machine Translation

1/08/192/08/19

Florence, Italy

Event: Conference

Download statistics

No data available

ID: 104688356