The ups and downs of large language model inference with vocabulary trimming by language heuristics

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Deploying large language models (LLMs) encounters challenges due to intensive computational and memory requirements. Our research examines vocabulary trimming (VT) inspired by restricting embedding entries to the language of interest to bolster time and memory efficiency. While such modifications have been proven effective in tasks like machine translation, tailoring them to LLMs demands specific modifications given the diverse nature of LLM applications. We apply two language heuristics to trim the full vocabulary—Unicode-based script filtering and corpus-based selection—to different LLM families and sizes. The methods are straightforward, interpretable, and easy to implement. It is found that VT reduces the memory usage of small models by nearly 50% and has an upper bound of 25% improvement in generation speed. Yet, we reveal the limitations of these methods in that they do not perform consistently well for each language with diminishing returns in larger models.
Original languageEnglish
Title of host publicationProceedings of the Fifth Workshop on Insights from Negative Results in NLP
EditorsShabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
PublisherAssociation for Computational Linguistics
Pages148–153
Number of pages6
ISBN (Electronic)9798891761025
Publication statusPublished - 20 Jun 2024
EventWorkshop on Insights from Negative Results in NLP -
Duration: 20 Jun 202420 Jun 2024

Workshop

WorkshopWorkshop on Insights from Negative Results in NLP
Period20/06/2420/06/24

Fingerprint

Dive into the research topics of 'The ups and downs of large language model inference with vocabulary trimming by language heuristics'. Together they form a unique fingerprint.

Cite this