Projects per year
Abstract / Description of output
This paper presents a simple but effective method for generating speech waveforms by selecting small units of stored speech to match a low-dimensional target representation. The method is designed as a drop-in replacement for the vocoder in a deep neural network-based text-to-speech system. Most previous work on hybrid unit selection waveform generation relies on phonetic annotation for determining unit
boundaries, or for specifying target cost, or for candidate preselection. In contrast, our waveform generator requires no phonetic information, annotation, or alignment. Unit boundaries are determined by epochs, and spectral analysis provides representations which are compared directly with target features at runtime. As in unit selection, we minimise a combination of target cost and join cost, but find that greedy left-to-right nearest-neighbour search gives similar results to dynamic programming. The method is fast and can generate the waveform incrementally. We use publicly available data and provide a permissively-licensed open source toolkit for reproducing our results.
boundaries, or for specifying target cost, or for candidate preselection. In contrast, our waveform generator requires no phonetic information, annotation, or alignment. Unit boundaries are determined by epochs, and spectral analysis provides representations which are compared directly with target features at runtime. As in unit selection, we minimise a combination of target cost and join cost, but find that greedy left-to-right nearest-neighbour search gives similar results to dynamic programming. The method is fast and can generate the waveform incrementally. We use publicly available data and provide a permissively-licensed open source toolkit for reproducing our results.
Original language | English |
---|---|
Title of host publication | Interspeech 2018 |
Place of Publication | Hyderabad, India |
Publisher | ISCA |
Pages | 2022-2026 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 6 Sept 2018 |
Event | Interspeech 2018 - Hyderabad International Convention Centre, Hyderabad, India Duration: 2 Sept 2018 → 6 Sept 2018 http://interspeech2018.org/ |
Publication series
Name | |
---|---|
Publisher | ISCA |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | Interspeech 2018 |
---|---|
Country/Territory | India |
City | Hyderabad |
Period | 2/09/18 → 6/09/18 |
Internet address |
Fingerprint
Dive into the research topics of 'Exemplar-based Speech Waveform Generation'. Together they form a unique fingerprint.Projects
- 1 Finished
-
SCRIPT : Speech Synthesis for Spoken Content Production
Yamagishi, J., King, S. & Watts, O.
1/12/16 → 30/11/19
Project: Research