A tree does not make a well-formed sentence: Improving syntactic string-to-tree statistical machine translation with more linguistic knowledge

Rico Sennrich, Philip Williams, Matthias Huck

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Abstract Synchronous context-free grammars (SCFGs) can be learned from parallel texts that are annotated with target-side syntax, and can produce translations by building target-side syntactic trees from source strings. Ideally, producing syntactic trees would entail that the translation is grammatically well-formed, but in reality, this is often not the case. Focusing on translation into German, we discuss various ways in which string-to-tree translation models over- or undergeneralise. We show how these problems can be addressed by choosing a suitable parser and modifying its output, by introducing linguistic constraints that enforce morphological agreement and constrain subcategorisation, and by modelling the productive generation of German compounds.
Original languageEnglish
Pages (from-to)27-45
Number of pages19
JournalComputer Speech and Language
Volume32
Issue number1
DOIs
Publication statusPublished - Jul 2015

Keywords / Materials (for Non-textual outputs)

  • Morphology
  • Statistical machine translation
  • Syntactic translation models
  • String-to-tree models

Fingerprint

Dive into the research topics of 'A tree does not make a well-formed sentence: Improving syntactic string-to-tree statistical machine translation with more linguistic knowledge'. Together they form a unique fingerprint.

Cite this