Edinburgh’s Phrase-based Machine Translation Systems for WMT-14

Nadir Durrani, Barry Haddow, Philipp Koehn, Kenneth Heafield

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper describes the University of Edinburgh’s (UEDIN) phrase-based submissions to the translation and medical translation shared tasks of the 2014 Workshop on Statistical Machine Translation (WMT). We participated in all language pairs. We have improved upon our 2013 system by i) using generalized representations, specifically automatic word clusters for translations out of English, ii) using unsupervised character-based models to translate unknown words in Russian-English and Hindi-English pairs, iii) synthesizing Hindi data from closely-related Urdu data, and iv) building huge language on the common crawl corpus.
Original languageEnglish
Title of host publicationProceedings of the Ninth Workshop on Statistical Machine Translation
Place of PublicationBaltimore, Maryland, USA
PublisherAssociation for Computational Linguistics
Number of pages8
Publication statusPublished - 1 Jun 2014

Cite this