Abstract
Instruction tuning a large language model with multiple languages can prepare it for multilingual downstream tasks. Nonetheless, it is yet to be determined whether having a handful of languages is sufficient, or whether the benefits increase with the inclusion of more. By fine-tuning large multilingual models on 1 to 52 languages, we present a case study on BLOOM to understand three pertinent factors affecting performance: the number of languages, language exposure, and similarity between training and test languages. Overall we found that 1) expanding language coverage in multilingual instruction tuning proves to be beneficial; 2) accuracy often significantly boots if the test language appears in the instruction mixture; 3) languages' genetic features correlate with cross-lingual transfer more than merely the number of language but different languages benefit to various degrees.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 31st International Conference on Computational Linguistics |
| Editors | Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert |
| Place of Publication | Abu Dhabi, UAE |
| Publisher | Association for Computational Linguistics |
| Pages | 2575-2581 |
| Number of pages | 7 |
| ISBN (Print) | 9798891761964 |
| Publication status | Published - 1 Jan 2025 |
| Event | The 31st International Conference on Computational Linguistics - Abu Dhabi, United Arab Emirates Duration: 19 Jan 2025 → 24 Jan 2025 Conference number: 31 http://www.wikicfp.com/cfp/servlet/event.showcfp?copyownerid=90704&eventid=180678 |
Publication series
| Name | Proceedings – International Conference on Computational Linguistics |
|---|---|
| Publisher | ACM |
| ISSN (Print) | 2951-2093 |
Conference
| Conference | The 31st International Conference on Computational Linguistics |
|---|---|
| Abbreviated title | COLING 2025 |
| Country/Territory | United Arab Emirates |
| City | Abu Dhabi |
| Period | 19/01/25 → 24/01/25 |
| Internet address |
Fingerprint
Dive into the research topics of 'How Many Languages Make Good Multilingual Instruction Tuning? A Case Study on BLOOM'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver