New Objective Distance Measures for Spectral Discontinuities in Concatenative Speech Synthesis

J. Vepa, S. King, Paul Taylor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The quality of unit selection based concatenative speech synthesis mainly depends on how well two successive units can be joined together to minimise the audible discontinuities. The objective measure of discontinuity used when selecting units is known as the `join cost'. The ideal join cost will measure `perceived' discontinuity, based on easily measurable spectral properties of the units being joined, in order to ensure smooth and natural-sounding synthetic speech. In this paper we describe a perceptual experiment conducted to measure the correlation between `subjective' human perception and various `objective' spectrally-based measures proposed in the literature. Also we report new objective distance measures derived from various distance metrics based on these spectral features, which have good correlation with human perception to concatenation discontinuities. Our experiments used a state-of-the art unit-selection text-to-speech system: `rVoice' from Rhetorical Systems Ltd.
Original languageEnglish
Title of host publicationProceedings of the 2002 IEEE workshop on speech synthesis
Number of pages4
Publication statusPublished - 1 Sept 2002


Dive into the research topics of 'New Objective Distance Measures for Spectral Discontinuities in Concatenative Speech Synthesis'. Together they form a unique fingerprint.

Cite this