Projects per year
Abstract / Description of output
A top-down hierarchical system based on deep neural networks is investigated for the modeling of prosody in speech synthesis. Suprasegmental features are processed separately from segmental features and a compact distributed representation of high-level units is learned at syllable-level. The suprasegmental representation is then integrated into a frame-level network. Objective measures show that balancing segmental and suprasegmental features can be useful for the frame-level network. Additional features incorporated into the hierarchical system are then tested. At the syllable-level, a bag-of-phones representation is proposed and, at the word-level, embeddings learned from text sources are used. It is shown that the hierarchical system is able to leverage new features at higher-levels more efficiently than a system which exploits them directly at the frame-level. A perceptual evaluation of the proposed systems is conducted and followed by a discussion of the results.
Original language | English |
---|---|
Title of host publication | Interspeech 2016 |
Publisher | International Speech Communication Association |
Pages | 3186-3190 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 12 Sept 2016 |
Event | Interspeech 2016 - San Francisco, United States Duration: 8 Sept 2016 → 12 Sept 2016 http://www.interspeech2016.org/ |
Publication series
Name | |
---|---|
Publisher | International Speech Communication Association |
ISSN (Print) | 1990-9772 |
Conference
Conference | Interspeech 2016 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 8/09/16 → 12/09/16 |
Internet address |
Fingerprint
Dive into the research topics of 'Syllable-Level Representations of Suprasegmental Features for DNN-Based Text-to-Speech Synthesis'. Together they form a unique fingerprint.Projects
- 2 Finished
-
SIWIS: Spoken Interaction with Interpretation in Switzerland
Yamagishi, J.
1/12/12 → 30/11/16
Project: Research
-