Pairwise evaluation of accent similarity in speech synthesis

Jinzuomu Zhong, Suyuan Liu, Dan Wells, Korin Richmond

Research output: Contribution to conferenceAbstractpeer-review

Abstract

Motivated by the social and moral imperative for more inclusive speech technology, the community has witnessed a growing interest in systems capable of generating high-fidelity accent, across various speech generation tasks. In ZeroShot Text-to-Speech (ZS-TTS), accent hallucination/mismatch, where the generated speech deviates from the reference speech in accent, is reported and addressed in [1]. In Accented TTS, numerous attempts have been made to generate high-fidelity accent based on pre-defined accent variety labels or intensity levels [2, 3]. In Accent Conversion (AC), numerous attempts have been made to map speech from foreign to native accent, preserving content and speaker information while removing the foreign accent in the source [4]. However, how to evaluate accent similarity in speech is under-researched and lacks consensus.
Original languageEnglish
Pages35
Number of pages1
Publication statusPublished - 2025
EventUK and Ireland Speech Conference - University of York, York, United Kingdom
Duration: 16 Jun 202517 Jun 2025
https://sites.google.com/york.ac.uk/ukis2025

Conference

ConferenceUK and Ireland Speech Conference
Abbreviated titleUKIS
Country/TerritoryUnited Kingdom
CityYork
Period16/06/2517/06/25
Internet address

Fingerprint

Dive into the research topics of 'Pairwise evaluation of accent similarity in speech synthesis'. Together they form a unique fingerprint.

Cite this