Edinburgh Research Explorer

Kalman-filter based Join Cost for Unit-selection Speech Synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publication Eurospeech 2003 - Interspeech 2003
Subtitle of host publication8th European Conference on Speech Communication and Technology
PublisherInternational Speech Communication Association
Pages293-296
Number of pages4
ISBN (Print)ISSN: 1990-9772
Publication statusPublished - 2003

Abstract

We introduce a new method for computing join cost in unit-selection speech synthesis which uses a linear dynamical model (also known as a Kalman filter) to model line spectral frequency trajectories. The model uses an underlying subspace in which it makes smooth, continuous trajectories. This subspace can be seen as an analogy for underlying articulator movement. Once trained, the model can be used to measure how well concatenated speech segments join together. The objective join cost is based on the error between model predictions and actual observations. We report correlations between this measure and mean listener scores obtained from a perceptual listening experiment. Our experiments use a state-of-the art unit-selection text-to-speech system: `rVoice' from Rhetorical Systems Ltd.

ID: 2077244