Projects per year
The Continuous Wavelet Transform (CWT) has been recently proposed to model f0 in the context of speech synthesis.It was shown that systems using signal decomposition with the CWT tend to outperform systems that model the signal directly.The f0 signal is typically decomposed into various scales of differing frequency. In these experiments, we reconstruct f0 with selected frequencies and ask native listeners to judge the naturalness of synthesized utterances with respect to natural speech. Results indicate that HMM-generated f0 is comparable to the CWT low frequencies, suggesting it mostly generates utterances with neutral intonation. Middle frequencies achieve very high levels of naturalness, while very high frequencies are mostly noise.
|Title of host publication||INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association|
|Publisher||International Speech Communication Association|
|Number of pages||5|
|Publication status||Published - 2015|