TY - GEN
T1 - An HMM-based speech synthesiser using Glottal Post-Filtering
AU - Cabral, João P.
AU - Renals, Steve
AU - Richmond, Korin
AU - Yamagishi, Junichi
N1 - Funding Information:
The first author is now supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (www.cngl.ie) at University College Dublin, Ireland. This paper is based on his PhD research undertaken at the University of Edinburgh, supported by Marie Curie Early Stage Training Site EdSST (MEST-CT-2005-020568).
Publisher Copyright:
© 2010 7th ISCA Workshop on Speech Synthesis, SSW 2010. All rights reserved.
PY - 2010/9/24
Y1 - 2010/9/24
N2 - Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker's identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesisers, in particular, do not typically allow control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the control of voice characteristics in these systems is limited. In contrast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters. We evaluated the proposed synthesiser in a perceptual experiment, in terms of speech naturalness, intelligibility, and similarity to the original speaker's voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.
AB - Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker's identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesisers, in particular, do not typically allow control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the control of voice characteristics in these systems is limited. In contrast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters. We evaluated the proposed synthesiser in a perceptual experiment, in terms of speech naturalness, intelligibility, and similarity to the original speaker's voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.
KW - glottal post-filter
KW - HMM-based speech synthesis
KW - voice quality
UR - http://www.scopus.com/inward/record.url?scp=80051655408&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80051655408
T3 - The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis
SP - 365
EP - 370
BT - Proceedings of The Seventh ISCA Tutorial and Research Workshop on Speech Synthesis Kyoto, Japan September 22-24, 2010
A2 - Sagisaka, Yoshinori
A2 - Tokuda, Keiichi
PB - ISCA
T2 - 7th ISCA Tutorial and Research Workshop on Speech Synthesis, SSW 2010
Y2 - 22 September 2010 through 24 September 2010
ER -