Characterizing code-switching: Applying linguistic principles for metric assessment and development

Jie Chi, Electra Wallington, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

With handling code-switching becoming an increasingly important topic in speech technology, driven by the expansion of low-resource and multilingual methodologies, it is vital that we recognize the diversity of code-switching as a phenomenon. We propose a framework that leverages linguistic findings as makeshift ground-truths to assess the quality and sufficiency of existing metrics designed to capture data-sets' differing code-switching styles. We also introduce a new metric, T-index, which leverages machine translation systems to capture properties of code-switched words in relation to the participating language pair. Through analysis of diverse Hindi-English and Mandarin-English datasets, we systematically explore how well these metrics align with linguistic intuition regarding code-switching richness levels in conversational versus technical domains.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2024
EditorsItshak Lapidot, Sharon Gannot
PublisherISCA
Pages7-11
Number of pages5
DOIs
Publication statusPublished - 5 Sept 2024
EventThe 25th Interspeech Conference - Kipriotis International Convention Center, Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024
Conference number: 25
https://interspeech2024.org/

Publication series

NameProceedings of Interspeech
PublisherISCA
ISSN (Electronic)2958-1796

Conference

ConferenceThe 25th Interspeech Conference
Abbreviated titleInterspeech 2024
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • speech recognition
  • code-switching
  • multilingual
  • linguistics
  • computational linguistics

Fingerprint

Dive into the research topics of 'Characterizing code-switching: Applying linguistic principles for metric assessment and development'. Together they form a unique fingerprint.

Cite this