Projects per year
Abstract / Description of output
In this paper we analyse the effect of speech corpus and compression method on the intelligibility of synthesized speech at fast rates. We recorded English and German language voice talents at a normal and a fast speaking rate and trained an HSMMbased synthesis system based on the normal and the fast data of each speaker. We compared three compression methods: scaling the variance of the state duration model, interpolating the duration models of the fast and the normal voices, and applying a linear compression method to generated speech. Word recognition results for the English voices show that generating speech at normal speaking rate and then applying linear compression resulted in the most intelligible speech at all tested rates. A similar result was found when evaluating the intelligibility of the natural speech corpus. For the German voices, interpolation was found to be better at moderate speaking rates but the linear method was again more successful at very high rates, for both blind and sighted participants. These results indicate that using fast speech data does not necessarily create more intelligible voices and that linear compression can more reliably provide higher intelligibility, particularly at higher rates.
Index Terms: fast speech, HMM-based speech synthesis, blind users
Index Terms: fast speech, HMM-based speech synthesis, blind users
Original language | English |
---|---|
Title of host publication | Interspeech |
Place of Publication | Singapore |
Publisher | International Speech Communication Association |
Number of pages | 5 |
Publication status | Published - Sept 2014 |
Fingerprint
Dive into the research topics of 'Intelligibility analysis of fast synthesized speech'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Synthesis of Fast Speech/Speech Synthesis of Auditive:Lecture Books (SALB)
Yamagishi, J.
1/02/13 → 31/03/14
Project: Research