Edinburgh Research Explorer

Deep neural network context embeddings for model selection in rich-context HMM synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: © Merritt, T., Yamagishi, J., Wu, Z., Watts, O., & King, S. (2015). Deep neural network context embeddings for model selection in rich-context HMM synthesis. In Proceedings of Interspeech 2015. Dresden: International Speech Communication Association. 10.7488/ds/256

    Accepted author manuscript, 129 KB, PDF-document

    Licence: Creative Commons: Attribution No Derivatives (CC-BY-ND)

http://datashare.is.ed.ac.uk/handle/10283/789
Original languageEnglish
Title of host publicationProceedings of Interspeech 2015
Place of PublicationDresden
PublisherInternational Speech Communication Association
DOIs
Publication statusPublished - 6 Sep 2015
EventInterspeech 2015 - Dresden, Germany
Duration: 6 Sep 20159 Sep 2015

Conference

ConferenceInterspeech 2015
CountryGermany
CityDresden
Period6/09/159/09/15

Abstract

This paper introduces a novel form of parametric synthesis that uses context embeddings produced by the bottleneck layer of a deep neural network to guide the selection of models in a rich-context HMM-based synthesiser. Rich-context synthesis – in which Gaussian distributions estimated from single linguistic contexts seen in the training data are used for synthesis, rather than more conventional decision tree-tied models – was originally proposed to address over-smoothing due to averag- ing across contexts. Our previous investigations have confirmed experimentally that averaging across different contexts is indeed one of the largest factors contributing to the limited quality of statistical parametric speech synthesis. However, a possible weakness of the rich context approach as previously formulated is that a conventional tied model is still used to guide selection of Gaussians at synthesis time. Our proposed approach replaces this with context embeddings derived from a neural network.

    Research areas

  • Speech Synthesis, Deep Neural Networks, Statistical parametric speech synthesis

Event

Interspeech 2015

6/09/159/09/15

Dresden, Germany

Event: Conference

Download statistics

No data available

ID: 19840193