Abstract
Motivated by the social and moral imperative for more inclusive speech technology, the community has witnessed a growing interest in systems capable of generating high-fidelity accent, across various speech generation tasks. In ZeroShot Text-to-Speech (ZS-TTS), accent hallucination/mismatch, where the generated speech deviates from the reference speech in accent, is reported and addressed in [1]. In Accented TTS, numerous attempts have been made to generate high-fidelity accent based on pre-defined accent variety labels or intensity levels [2, 3]. In Accent Conversion (AC), numerous attempts have been made to map speech from foreign to native accent, preserving content and speaker information while removing the foreign accent in the source [4]. However, how to evaluate accent similarity in speech is under-researched and lacks consensus.
| Original language | English |
|---|---|
| Pages | 35 |
| Number of pages | 1 |
| Publication status | Published - 2025 |
| Event | UK and Ireland Speech Conference - University of York, York, United Kingdom Duration: 16 Jun 2025 → 17 Jun 2025 https://sites.google.com/york.ac.uk/ukis2025 |
Conference
| Conference | UK and Ireland Speech Conference |
|---|---|
| Abbreviated title | UKIS |
| Country/Territory | United Kingdom |
| City | York |
| Period | 16/06/25 → 17/06/25 |
| Internet address |
Fingerprint
Dive into the research topics of 'Pairwise evaluation of accent similarity in speech synthesis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver