Edinburgh Research Explorer

Combining a Vector Space Representation of Linguistic Context with a Deep Neural Network for Text-To-Speech Synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publication8th ISCA Speech Synthesis Workshop
Pages261-265
Number of pages5
Publication statusPublished - 1 Aug 2013
Event8th ISCA Speech Synthesis Workshop - Barcelona, United Kingdom
Duration: 31 Aug 2013 → …

Conference

Conference8th ISCA Speech Synthesis Workshop
CountryUnited Kingdom
CityBarcelona
Period31/08/13 → …

Abstract

Conventional statistical parametric speech synthesis relies on decision trees to cluster together similar contexts, result- ing in tied-parameter context-dependent hidden Markov models (HMMs). However, decision tree clustering has a major weak- ness: it use hard division and subdivides the model space based on one feature at a time, fragmenting the data and failing to exploit interactions between linguistic context features. These linguistic features themselves are also problematic, being noisy and of varied relevance to the acoustics. We propose to combine our previous work on vector-space representations of linguistic context, which have the added ad- vantage of working directly from textual input, and Deep Neural Networks (DNNs), which can directly accept such continuous representations as input. The outputs of the network are probability distributions over speech features. Maximum Likelihood Parameter Generation is then used to create parameter trajectories, which in turn drive a vocoder to generate the waveform. Various configurations of the system are compared, using both conventional and vector space context representations and with the DNN making speech parameter predictions at two dif- ferent temporal resolutions: frames, or states. Both objective and subjective results are presented.

Event

8th ISCA Speech Synthesis Workshop

31/08/13 → …

Barcelona, United Kingdom

Event: Conference

Download statistics

No data available

ID: 25291172