N-gram language models for massively parallel devices

Nikolay Bogoychev, Adam Lopez

Research output: Chapter in Book/Report/Conference proceedingConference contribution


For many applications, the query speed of N-gram language models is a computational bottleneck. Although massively parallel hardware like GPUs offer a potential solution to this bottleneck, exploiting this hardware requires a careful rethinking of basic algorithms and data structures. We present the first language model designed for such hardware, using B-trees to maximize data parallelism and minimizememory footprint and latency. Comparedwith a single-threaded instance of KenLM (Heafield, 2011), a highly optimized CPU based language model, our GPU implementation produces identical results with a smaller memory footprint and a sixfold increase in throughput on a batch query task. When we saturate both devices, the GPU delivers nearly twice the throughput per hardware dollar even when the CPU implementation uses faster data structures.
Original languageEnglish
Title of host publicationThe 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016)
Place of PublicationBerlin, Germany
PublisherAssociation for Computational Linguistics
Number of pages10
ISBN (Print)978-1-945626-00-5
Publication statusPublished - 12 Aug 2016
Event54th Annual Meeting of the Association for Computational Linguistics - Berlin, Germany
Duration: 7 Aug 201612 Aug 2016


Conference54th Annual Meeting of the Association for Computational Linguistics
Abbreviated titleACL 2016
Internet address


Dive into the research topics of 'N-gram language models for massively parallel devices'. Together they form a unique fingerprint.

Cite this