## Abstract / Description of output

We present an efficient algorithm to estimate large modified Kneser-Ney models

including interpolation. Streaming and sorting enables the algorithm to scale

to much larger models by using a fixed amount of RAM and variable amount of

disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned

model on 126 billion tokens. Machine translation experiments with this model

show improvement of 0.8 BLEU point over constrained systems for the 2013

Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.

Original language | English |
---|---|

Title of host publication | Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 2: Short Papers |

Pages | 690-696 |

Number of pages | 7 |

Publication status | Published - 2013 |