Abstract / Description of output
With handling code-switching becoming an increasingly important topic in speech technology, driven by the expansion of low-resource and multilingual methodologies, it is vital that we recognize the diversity of code-switching as a phenomenon. We propose a framework that leverages linguistic findings as makeshift ground-truths to assess the quality and sufficiency of existing metrics designed to capture data-sets' differing code-switching styles. We also introduce a new metric, T-index, which leverages machine translation systems to capture properties of code-switched words in relation to the participating language pair. Through analysis of diverse Hindi-English and Mandarin-English datasets, we systematically explore how well these metrics align with linguistic intuition regarding code-switching richness levels in conversational versus technical domains.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2024 |
Editors | Itshak Lapidot, Sharon Gannot |
Publisher | ISCA |
Pages | 7-11 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 5 Sept 2024 |
Event | The 25th Interspeech Conference - Kipriotis International Convention Center, Kos Island, Greece Duration: 1 Sept 2024 → 5 Sept 2024 Conference number: 25 https://interspeech2024.org/ |
Publication series
Name | Proceedings of Interspeech |
---|---|
Publisher | ISCA |
ISSN (Electronic) | 2958-1796 |
Conference
Conference | The 25th Interspeech Conference |
---|---|
Abbreviated title | Interspeech 2024 |
Country/Territory | Greece |
City | Kos Island |
Period | 1/09/24 → 5/09/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- speech recognition
- code-switching
- multilingual
- linguistics
- computational linguistics