Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting

Nikolay Bogoychev, Pinzhen Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Terminology correctness is important in the downstream application of machine translation, and a prevalent way to ensure this is to inject terminology constraints into a translation system. In our submission to the WMT 2023 terminology translation task, we adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts. We annotate random source words with pseudo-terminology translations obtained from word alignment to first train a terminology-aware model. Further, we explore two post-processing methods. First, we use an alignment process to discover whether a terminology constraint has been violated, and if so, we re-decode with the violating word negatively constrained. Alternatively, we leverage a large language model to refine a hypothesis by providing it with terminology constraints. Results show that our terminology-aware model learns to incorporate terminologies effectively, and the large language model refinement process can further improve terminology recall.
Original languageEnglish
Title of host publicationProceedings of the Eighth Conference on Machine Translation
PublisherAssociation for Computational Linguistics
Pages890-896
ISBN (Electronic)979-8-89176-041-7
DOIs
Publication statusPublished - 6 Dec 2023
EventEighth Conference on Machine Translation - Singapore, Singapore
Duration: 6 Dec 20237 Dec 2023
Conference number: 8
https://machinetranslate.org/wmt23

Conference

ConferenceEighth Conference on Machine Translation
Abbreviated titleWMT 2023
Country/TerritorySingapore
CitySingapore
Period6/12/237/12/23
Internet address

Fingerprint

Dive into the research topics of 'Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting'. Together they form a unique fingerprint.

Cite this