We present progress on Joshua, an open-source decoder for hierarchical and syntax-based machine translation. The main focus is describing Thrax, a flexible, open source synchronous context-free grammar extractor. Thrax extracts both hierarchical (Chiang, 2007) and syntax-augmented machine translation (Zollmann and Venugopal, 2006) grammars. It is built on Apache Hadoop for efficient distributed performance, and can easily be extended with support for new grammars, feature functions, and output formats.
|Title of host publication||Proceedings of the Sixth Workshop on Statistical Machine Translation|
|Place of Publication||Edinburgh, Scotland|
|Publisher||Association for Computational Linguistics|
|Number of pages||7|
|Publication status||Published - 1 Jul 2011|