Projects per year
Abstract / Description of output
This paper presents a hybrid text-to-speech framework that uses a waveform generation method based on examplars of natural speech waveform. These examplars are selected at synthesis time given a sequence of acoustic features generated from text by a statistical parametric speech synthesis model. In order to match the expected degradation of these target synthesis features, the database of units is constructed such that the units’ target representations are generated from the same parametric model. We evaluate two variants of this framework by modifying the size of the examplar: a small unit variant (where unit boundaries are determined by pitch mark location) and a halfphone variant (where unit boundaries are determined by subphone state forced alignment). We found that for a larger dataset (around four hours of training data) the examplar-based waveform generation variants are rated higher than the vocoder-based system.
|Title of host publication||2018 IEEE Workshop on Spoken Language Technology (SLT)|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||7|
|ISBN (Electronic)||978-1-5386-4334-1, 978-1-5386-4333-4|
|Publication status||Published - 14 Feb 2019|
|Event||2018 IEEE Workshop on Spoken Language Technology (SLT) - Athens, Greece|
Duration: 18 Dec 2018 → 21 Dec 2018
|Conference||2018 IEEE Workshop on Spoken Language Technology (SLT)|
|Abbreviated title||IEEE SLT 2018|
|Period||18/12/18 → 21/12/18|
Keywords / Materials (for Non-textual outputs)
- unit selection
FingerprintDive into the research topics of 'Exemplar-based speech waveform generation for text-to-speech'. Together they form a unique fingerprint.
- 1 Finished
Yamagishi, J., King, S. & Watts, O.
1/12/16 → 30/11/19