Projects per year
Abstract
Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an 'average voice model' plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on 'non-TTS' corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 |
Subtitle of host publication | 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009; Brighton, United Kingdom |
Pages | 420-423 |
Number of pages | 4 |
Publication status | Published - Sep 2009 |
Fingerprint
Dive into the research topics of 'Thousands of voices for HMM-based speech synthesis'. Together they form a unique fingerprint.Projects
- 2 Finished
-
EMIME: Effective multilingual interaction in mobile environments. RTD Linked to RE7006
1/03/08 → 28/02/11
Project: Research
-
Streamed models for automatic speech recognition (EPSRC Advanced Research Fellowship)
1/01/05 → 31/12/09
Project: Research