Tunable Distortion Limits and Corpus Cleaning for SMT

Sara Stymne, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.
Original languageEnglish
Title of host publicationProceedings of the Eighth Workshop on Statistical Machine Translation
Place of PublicationSofia, Bulgaria
PublisherAssociation for Computational Linguistics
Pages225-231
Number of pages7
ISBN (Electronic)978-1-937284-57-2
Publication statusPublished - 9 Aug 2013
EventACl 2013 Eighth Workshop on Statistical Machine Translation - Sofia, Bulgaria
Duration: 8 Aug 20139 Aug 2013
http://www.statmt.org/wmt13/

Workshop

WorkshopACl 2013 Eighth Workshop on Statistical Machine Translation
Abbreviated titleWMT13
CountryBulgaria
CitySofia
Period8/08/139/08/13
Internet address

Fingerprint Dive into the research topics of 'Tunable Distortion Limits and Corpus Cleaning for SMT'. Together they form a unique fingerprint.

Cite this