Abstract / Description of output
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures; however, a deeper analysis is needed to examine the benefits and pitfalls of each method. In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels and a fine-grained test set. We define three levels of CSI figurativeness for a more nuanced analysis and develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories. Importantly, we are the first to integrate human translation theories into LLM-driven translation processes, significantly improving translation accuracy, with COMET scores increasing by up to 7 points. The code and dataset are available at https://github.com/Henry8772/ChineseMenuCSI.
Original language | English |
---|---|
Title of host publication | Proceedings of the Ninth Conference on Machine Translation |
Editors | Barry Haddow, Tom Kocmi, Philipp Koehn, Christof Monz |
Publisher | Association for Computational Linguistics |
Pages | 1258–1271 |
Number of pages | 14 |
ISBN (Electronic) | 9798891761797 |
Publication status | Published - 16 Nov 2024 |
Event | Ninth Conference on Machine Translation - Miami, United States Duration: 15 Nov 2024 → 16 Nov 2024 |
Conference
Conference | Ninth Conference on Machine Translation |
---|---|
Abbreviated title | WMT24 |
Country/Territory | United States |
City | Miami |
Period | 15/11/24 → 16/11/24 |