Projects per year
Abstract / Description of output
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this ‘divide-and-conquer’ approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Original language | English |
---|---|
Article number | 2 |
Pages (from-to) | 1-17 |
Journal | EURASIP Journal on Advances in Signal Processing |
Volume | 2015 |
Issue number | 2 |
Early online date | 9 Jan 2015 |
DOIs | |
Publication status | Published - 2015 |
Keywords / Materials (for Non-textual outputs)
- context clustering
- decision tree-based clustering
- F0 modeling
- Hidden Markov model
- HMM
- HMM-based speech synthesis
- maximum entropy model
- soft context clustering
- soft decision tree
- statistical parametric speech synthesis
Fingerprint
Dive into the research topics of 'Soft context clustering for F0 modeling in HMM-based speech synthesis'. Together they form a unique fingerprint.Projects
- 3 Finished
-
-
Simple4All: Speech synthesis that improves through adaptive learning
1/11/11 → 31/10/14
Project: Research
File -
Profiles
-
Simon King
- School of Philosophy, Psychology and Language Sciences - Personal Chair of Speech Processing
- Centre for Speech Technology Research
Person: Academic: Research Active