A Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression

Marcin Junczys-Dowmunt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes the first steps towards a minimum-size phrase table implementation to be used for phrase-based statistical machine translation. The focus lies on the size reduction of target language data in a phrase table. Rank Encoding (R-Enc), a novel method for the compression of word-aligned target language in phrase tables is presented. Combined with Huffman coding a relative size reduction of 56 percent for target phrase words and alignment data is achieved when compared to bare Huffman coding without R-Enc. In the context of the complete phrase table the size reduction is 22 percent.
Original languageEnglish
Title of host publication16th Annual Conference of the European Association for Machine Translation (EAMT)
EditorsMauro Cettolo, Marcello Federico, Lucia Specia, Andy Way
Place of PublicationTrento, Italy
Pages245-252
Number of pages8
Publication statusPublished - 2012
Event16th Annual Conference of the European Association for Machine Translation (EAMT) - Fondazione Bruno Kessler (FBK) Center for Scientific and Technological Research, Trento, Italy
Duration: 28 May 201230 May 2012

Conference

Conference16th Annual Conference of the European Association for Machine Translation (EAMT)
Country/TerritoryItaly
CityTrento
Period28/05/1230/05/12

Cite this