Handling Technical OOVs in SMT

Mark Fishel, Rico Sennrich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a project on machine translation of software help desk tickets, a highly technical text domain. The main source of translation errors were out-of-vocabulary tokens (OOVs), most of which were either in-domain German compounds or technical token sequences that must be preserved verbatim in the output. We describe our efforts on compound splitting and treatment of non-translatable tokens, which lead to a significant translation quality gain.
Original languageEnglish
Title of host publicationThe Seventeenth Annual Conference of the European Association for Machine Translation (EAMT2014)
Pages159-162
Number of pages4
Publication statusPublished - 2014

Fingerprint Dive into the research topics of 'Handling Technical OOVs in SMT'. Together they form a unique fingerprint.

Cite this