In Statistical Machine Translation, in-domain and out-of-domain training data are not always clearly delineated. This paper investigates how we can still use mixture-modeling techniques for domain adaptation in such cases. We apply unsupervised clustering methods to split the original training set, and then use mixture-modeling techniques to build a model adapted to a given target domain. We show that this approach improves performance over an unadapted baseline, and several alternative domain adaptation methods.
|Title of host publication||Proceedings of the 16th EAMT Conference|
|Number of pages||8|
|Publication status||Published - May 2012|
|Event||16th Annual Conference of the European Association for Machine Translation (EAMT) - Fondazione Bruno Kessler (FBK) Center for Scientific and Technological Research, Trento, Italy|
Duration: 28 May 2012 → 30 May 2012
|Conference||16th Annual Conference of the European Association for Machine Translation (EAMT)|
|Period||28/05/12 → 30/05/12|