Projects per year
Abstract
Speech units are highly context-dependent, so taking contextual features into account is essential for speech modelling. Context is employed in HMM-based Text-to-Speech speech synthesis systems via context-dependent phone models. A very wide context is taken into account, represented by a large set of contextual factors. However, most of these factors probably have no significant influence on the speech, most of the time. To discover which combinations of features should be taken into account, decision tree-based context clustering is used. But the space of context-dependent models is vast, and the number of contexts seen in the training data is only a tiny fraction of this space, so the task of the decision tree is very hard: to generalise from observations of a tiny fraction of the space to the rest of the space, whilst ignoring uninformative or redundant context features. The structure of the context feature space has not been systematically studied for speech synthesis. In this paper we discover a dependency structure by learning a Bayesian Network over the joint distribution of the features and the speech. We demonstrate that it is possible to discard the majority of context features with minimal impact on quality, measured by a perceptual test.
Original language | English |
---|---|
Title of host publication | INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012 |
Publisher | ISCA-INST SPEECH COMMUNICATION ASSOC |
Publication status | Published - 1 Sep 2012 |
Fingerprint
Dive into the research topics of 'Using Bayesian Networks to find relevant context features for HMM-based speech synthesis'. Together they form a unique fingerprint.Projects
- 1 Finished
Activities
- 1 Invited talk
-
EACL 2014 keynote: Speech synthesis needs YOU!
Simon King (Speaker)
29 Apr 2014Activity: Academic talk or presentation types › Invited talk
File