Mixed-Domain vs. Multi-Domain Statistical Machine Translation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Domain adaptation boosts translation quality on in-domain data, but translation quality for domain adapted systems on out-of-domain data tends to suffer. Users of web-based translation services expect high quality translation across a wide range of diverse domains, and what makes the task even more difficult is that no domain label is provided with the translation request.

In this paper we present an approach to domain adaptation which results in large-scale, general purpose machine translation systems. First, we tune our translation models to multiple individual domains. Then, by means of source-side domain classification, we are able to predict the domain of individual input sentences and thereby select the appropriate domain-specific model parameters. We call this approach multi-domain translation.

We develop state-of-the-art, domain-adapted translation engines for three broadly-defined domains:TED talks, Europarl, and News. Our results suggest that multi-domain translation performs better than a mixed-domain approach, which deploys a system that has been tuned on a development set composed of samples from many domains.
Original languageEnglish
Title of host publicationProceedings of MT Summit XV, vol.1: MT Researchers' Track
Pages240-255
Number of pages16
Publication statusPublished - Nov 2015

Fingerprint

Dive into the research topics of 'Mixed-Domain vs. Multi-Domain Statistical Machine Translation'. Together they form a unique fingerprint.

Cite this