We present a project on machine translation of software help desk tickets, a highly technical text domain. The main source of translation errors were out-of-vocabulary tokens (OOVs), most of which were either in-domain German compounds or technical token sequences that must be preserved verbatim in the output. We describe our efforts on compound splitting and treatment of non-translatable tokens, which lead to a significant translation quality gain.
|Title of host publication||The Seventeenth Annual Conference of the European Association for Machine Translation (EAMT2014)|
|Number of pages||4|
|Publication status||Published - 2014|