Articulatory Control of HMM-based Parametric Speech Synthesis using Feature-Space-Switched Multiple Regression

Z. Ling, K. Richmond, J. Yamagishi

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In previous work we proposed a method to control the characteristics of synthetic speech flexibly by integrating articulatory features into a hidden Markov model (HMM) based parametric speech synthesiser. In this method, a unified acoustic-articulatory model is trained, and context-dependent linear transforms are used to model the dependency between the two feature streams. In this paper, we go significantly further and propose a feature-space-switched multiple regression HMM to improve the performance of articulatory control. A multiple regression HMM (MRHMM) is adopted to model the distribution of acoustic features, with articulatory features used as exogenous "explanatory variables". A separate Gaussian mixture model (GMM) is introduced to model the articulatory space, and articulatory-to-acoustic regression matrices are trained for each component of this GMM, instead of for the context-dependent states in the HMM. Furthermore, we propose a task-specific context feature tailoring method to ensure compatibility between state context features and articulatory features that are manipulated at synthesis time. The proposed method is evaluated on two tasks, using a speech database with acoustic waveforms and articulatory movements recorded in parallel by electromagnetic articulography (EMA). In a vowel identity modification task, the new method achieves better performance when reconstructing target vowels by varying articulatory inputs than our previous approach. A second vowel creation task shows our new method is highly effective at producing a new vowel from appropriate articulatory representations which, even though no acoustic samples for this vowel are present in the training data, is shown to sound highly natural.
Original languageEnglish
Pages (from-to)207-219
Number of pages13
JournalIEEE Transactions on Audio, Speech and Language Processing
Issue number1
Early online date1 Sept 2012
Publication statusPublished - Jan 2013

Keywords / Materials (for Non-textual outputs)

  • Articulatory features
  • Gaussian mixture model
  • multiple-regression hidden Markov model
  • speech synthesis


Dive into the research topics of 'Articulatory Control of HMM-based Parametric Speech Synthesis using Feature-Space-Switched Multiple Regression'. Together they form a unique fingerprint.

Cite this