Gaussian process dynamical models for nonparametric speech representation and synthesis

Gustav Eje Henter, Marcus R. Frean, W. Bastiaan Kleijn

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose Gaussian process dynamical models (GPDMs) as a new, nonparametric paradigm in acoustic models of speech. These use multidimensional, continuous state-spaces to overcome familiar issues with discrete-state, HMM-based speech models. The added dimensions allow the state to represent and describe more than just temporal structure as systematic differences in mean, rather than as mere correlations in a residual (which dynamic features or AR-HMMs do). Being based on Gaussian processes, the models avoid restrictive parametric or linearity assumptions on signal structure. We outline GPDM theory, and describe model setup and initialization schemes relevant to speech applications. Experiments demonstrate subjectively better quality of synthesized speech than from comparable HMMs. In addition, there is evidence for unsupervised discovery of salient speech structure.
Original languageEnglish
Title of host publicationProc. ICASSP
Place of PublicationKyoto, Japan
Pages4505-4508
Number of pages4
Volume37
DOIs
Publication statusPublished - 1 Mar 2012

Keywords

  • acoustic models
  • stochastic models
  • non-parametric speech synthesis
  • sampling

Fingerprint Dive into the research topics of 'Gaussian process dynamical models for nonparametric speech representation and synthesis'. Together they form a unique fingerprint.

Cite this