Abstract
We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.
| Original language | English |
|---|---|
| Title of host publication | 2012 International Workshop on Spoken Language Translation, IWSLT 2012, Hong Kong, December 6-7, 2012 |
| Pages | 268-275 |
| Number of pages | 8 |
| Publication status | Published - 2012 |