Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems. We argue that a principle reason for this failure is not dealing with multiple, equivalent translations. We present a translation model which models derivations as a latent variable, in both training and decoding, and is fully discriminative and globally optimised. Results show that accounting for multiple derivations does indeed improve performance. Additionally, we show that regularisation is essential for maximum conditional likelihood models in order to avoid degenerate solutions.
|Title of host publication||Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008)|
|Subtitle of host publication||Human Language Technologies|
|Publisher||Association for Computational Linguistics|
|Number of pages||9|
|Publication status||Published - 2008|