Low-dimensional style token control for hyperarticulated speech synthesis

Miku Nishihara, Dan Wells, Korin Richmond, Aidan Pine

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Global style tokens (GSTs) allow for rich modelling of the variation in a speech corpus and subsequent control of text-to-speech synthesis (TTS). However, certain styles of speech may be marked by variation along multiple dimensions, complicating the interpretation and control of learned style tokens. One example is hyperarticulated or ‘clear’ speech, for example as directed toward listeners with hearing impairments or language learners in the classroom, which in English is characterised by reduced speaking rate, increased F0, more careful articulation of vowels and plosive consonants, and other factors. We present a method for simplifying control of style tokens by applying principal components analysis (PCA) to GST weights from a TTS system trained on both plain and clear speech. We identify the axes of variation in PCA space with the acoustic correlates of clear speech in English and show that we can synthesise either style by moving along a single dimension in that space.
Original languageEnglish
Title of host publicationInterspeech 2024
PublisherInternational Speech Communication Association (ISCA)
Pages1-5
Number of pages5
DOIs
Publication statusPublished - 1 Sept 2024
EventThe 25th Interspeech Conference - Kipriotis International Convention Center, Kos Island, Greece
Duration: 1 Sept 20245 Sept 2024
Conference number: 25
https://interspeech2024.org/

Publication series

NameInterspeech
PublisherInternational Speech Communication Association (ISCA)
ISSN (Electronic)2958-1796

Conference

ConferenceThe 25th Interspeech Conference
Abbreviated titleInterspeech 2024
Country/TerritoryGreece
CityKos Island
Period1/09/245/09/24
Internet address

Fingerprint

Dive into the research topics of 'Low-dimensional style token control for hyperarticulated speech synthesis'. Together they form a unique fingerprint.

Cite this