Abstract
We describe the structure of a space-efficient phrase table for phrase-based statistical machine translation with the Moses decoder. The new phrase table can be used in-memory or be partially mapped on-disk. Compared to the standard Moses on-disk phrase table implementation a size reduction by a factor of 6 is achieved.
The focus of this work lies on the source phrase index which is implemented using minimal perfect hash functions. Two methods are discussed that reduce the memory consumption of a baseline implementation.
The focus of this work lies on the source phrase index which is implemented using minimal perfect hash functions. Two methods are discussed that reduce the memory consumption of a baseline implementation.
Original language | English |
---|---|
Title of host publication | Text, Speech and Dialogue |
Subtitle of host publication | 15th International Conference, TSD 2012, Brno, Czech Republic, September 3-7, 2012. Proceedings |
Editors | Petr Sojka, Ales Horák, Ivan Kopecek, Karel Pala |
Place of Publication | Berlin, Heidelberg |
Publisher | Springer |
Pages | 320-327 |
Number of pages | 8 |
ISBN (Electronic) | 978-3-642-32790-2 |
ISBN (Print) | 978-3-642-32789-6 |
DOIs | |
Publication status | Published - 2012 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer Berlin Heidelberg |
Volume | 7499 |
ISSN (Print) | 0302-9743 |
Keywords / Materials (for Non-textual outputs)
- statistical machine translation
- compact phrase table
- minimal perfect hash function
- Moses