Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation

Abhishek Arun, Philipp Koehn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates the task of training discriminatively a phrase based SMT system with millions of features using the structured perceptron and the Margin Infused Relax Algorithm (MIRA), two popular online learning algorithms. We also compare two different update strategies, one where we update towards an oracle translation candidate extracted from an N-best list vs a more aggressive approach in which we update towards an oracle extracted prior to training using a minloss decoder. We evaluate our different training algorithms on the Czech-English translation task. Our results show that while both learning algorithms achieve similar results, with the perceptron converging more rapidly, the aggressive update strategy performs significantly worse than the more conservative strategy corroborating Liang et al. (2006)’s findings.
Original languageEnglish
Title of host publicationMT SUMMIT XI 10-14 September 2007, Copenhagen, Denmark, Proceedings
PublisherEuropean Association for Machine Translation
Number of pages6
ISBN (Print)9788790708160
Publication statusPublished - 2007

Fingerprint

Dive into the research topics of 'Online Learning Methods For Discriminative Training of Phrase Based Statistical Machine Translation'. Together they form a unique fingerprint.

Cite this