A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis

Manuel Sam Ribeiro, Junichi Yamagishi, Robert A. J. Clark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Continuous Wavelet Transform (CWT) has been recently proposed to model f0 in the context of speech synthesis.It was shown that systems using signal decomposition with the CWT tend to outperform systems that model the signal directly.The f0 signal is typically decomposed into various scales of differing frequency. In these experiments, we reconstruct f0 with selected frequencies and ask native listeners to judge the naturalness of synthesized utterances with respect to natural speech. Results indicate that HMM-generated f0 is comparable to the CWT low frequencies, suggesting it mostly generates utterances with neutral intonation. Middle frequencies achieve very high levels of naturalness, while very high frequencies are mostly noise.
Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
Pages1586-1590
Number of pages5
Publication statusPublished - 2015

Fingerprint Dive into the research topics of 'A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis'. Together they form a unique fingerprint.

Cite this