Abstract
This paper describes the first steps towards a minimum-size phrase table implementation to be used for phrase-based statistical machine translation. The focus lies on the size reduction of target language data in a phrase table. Rank Encoding (R-Enc), a novel method for the compression of word-aligned target language in phrase tables is presented. Combined with Huffman coding a relative size reduction of 56 percent for target phrase words and alignment data is achieved when compared to bare Huffman coding without R-Enc. In the context of the complete phrase table the size reduction is 22 percent.
Original language | English |
---|---|
Title of host publication | 16th Annual Conference of the European Association for Machine Translation (EAMT) |
Editors | Mauro Cettolo, Marcello Federico, Lucia Specia, Andy Way |
Place of Publication | Trento, Italy |
Pages | 245-252 |
Number of pages | 8 |
Publication status | Published - 2012 |
Event | 16th Annual Conference of the European Association for Machine Translation (EAMT) - Fondazione Bruno Kessler (FBK) Center for Scientific and Technological Research, Trento, Italy Duration: 28 May 2012 → 30 May 2012 |
Conference
Conference | 16th Annual Conference of the European Association for Machine Translation (EAMT) |
---|---|
Country/Territory | Italy |
City | Trento |
Period | 28/05/12 → 30/05/12 |