Simple methods for improving speaker-similarity of HMM-based speech synthesis

Junichi Yamagishi, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we revisit some basic configuration choices of HMM based speech synthesis, such as waveform sampling rate, auditory frequency warping scale and the logarithmic scaling of F0, with the aim of improving speaker similarity which is an acknowledged weakness of current HMM-based speech synthesisers. All of the techniques investigated are simple but, as we demonstrate using perceptual tests, can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling rates can offer enhanced feature extraction and improved speaker similarity for speech synthesis. In addition, a generalized logarithmic transform of F0 results in larger intra-utterance variance of F0 trajectories and hence more dynamic and natural-sounding prosody.
Original languageEnglish
Title of host publicationProc. ICASSP 2010
Publication statusPublished - 2010

Fingerprint

Dive into the research topics of 'Simple methods for improving speaker-similarity of HMM-based speech synthesis'. Together they form a unique fingerprint.

Cite this