Projects per year
This paper introduces a novel form of parametric synthesis that uses context embeddings produced by the bottleneck layer of a deep neural network to guide the selection of models in a rich-context HMM-based synthesiser. Rich-context synthesis – in which Gaussian distributions estimated from single linguistic contexts seen in the training data are used for synthesis, rather than more conventional decision tree-tied models – was originally proposed to address over-smoothing due to averag- ing across contexts. Our previous investigations have confirmed experimentally that averaging across different contexts is indeed one of the largest factors contributing to the limited quality of statistical parametric speech synthesis. However, a possible weakness of the rich context approach as previously formulated is that a conventional tied model is still used to guide selection of Gaussians at synthesis time. Our proposed approach replaces this with context embeddings derived from a neural network.
|Title of host publication||Proceedings of Interspeech 2015|
|Place of Publication||Dresden|
|Publisher||International Speech Communication Association|
|Publication status||Published - 6 Sep 2015|
|Event||Interspeech 2015 - Dresden, Germany|
Duration: 6 Sep 2015 → 9 Sep 2015
|Period||6/09/15 → 9/09/15|
- Speech Synthesis
- Deep Neural Networks
- Statistical parametric speech synthesis
Listening test materials for "Deep neural network context embeddings for model selection in rich-context HMM synthesis"
King, S. (Creator) & Merritt, T. (Creator), Edinburgh DataShare, 8 Jun 2015