Sparse lexicalised features and topic adaptation for SMT

Eva Hasler, Barry Haddow, Philipp Koehn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a new approach to domain adaptation for SMT that enriches standard phrase-based models with lexicalised word and phrase pair features to help the model select appropriate translations for the target domain (TED talks). In addition, we show how source-side sentence-level topics can be incorporated to make the features differentiate between more fine-grained topics within the target domain (topic adaptation). We compare tuning our sparse features on a development set versus on the entire in-domain corpus and introduce a new method of porting them to larger mixed-domain models. Experimental results show that our features improve performance over a MIRA baseline and that in some cases we can get additional improvements with topic features. We evaluate our methods on two language pairs, English-French and German-English, showing promising results.
Original languageEnglish
Title of host publication2012 International Workshop on Spoken Language Translation, IWSLT 2012, Hong Kong, December 6-7, 2012
Pages268-275
Number of pages8
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Sparse lexicalised features and topic adaptation for SMT'. Together they form a unique fingerprint.

Cite this