Projects per year
Abstract
Audiobooks have been focused on as promising data for training Text-to-Speech (TTS) systems. However, they usually do not have a correspondence between audio and text data. Moreover, they are usually divided only into chapter units. In practice, we have to make a correspondence of audio and text data before we use them for building TTS synthesisers. However aligning audio and text data is time-consuming and involves manual labor. It also requires persons skilled in speech processing. Previously, we have proposed to use graphemes for automatically aligning speech and text data. This paper further integrates a lightly supervised voice activity detection (VAD) technique to detect sentence boundaries as a pre-processing step before the grapheme approach. This lightly supervised technique requires time stamps of speech and silence only for the first fifty sentences. Combining those, we can semi-automatically build TTS systems from audiobooks with minimum manual intervention. From subjective evaluations we analyse how the grapheme-based aligner and/or the proposed VAD technique impact the quality of HMM-based speech synthesisers trained on audiobooks.
Original language | English |
---|---|
Title of host publication | Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 7987-7991 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2013 |
Keywords
- hidden Markov models
- signal detection
- speech synthesis
- HMM-based speech synthesisers
- VAD technique
- audiobook
- grapheme-based aligner approach
- lightly supervised GMM VAD
- lightly supervised voice activity detection technique
- minimum manual intervention
- semiautomatically build TTS systems
- sentence boundary detection
- speech processing
- text data
- text-to-speech system training
- Buildings
- Hidden Markov models
- Manuals
- Speech
- Speech synthesis
- Synthesizers
- HMM-based speech synthesis
- lightly supervised
- voice activity detection
Fingerprint
Dive into the research topics of 'Lightly supervised GMM VAD to use audiobook for speech synthesiser'. Together they form a unique fingerprint.Projects
- 3 Finished
-
Deep architectures for statistical speech synthesis
Yamagishi, J.
UK industry, commerce and public corporations
4/09/12 → 3/03/16
Project: Research
-
Simple4All: Speech synthesis that improves through adaptive learning
1/11/11 → 31/10/14
Project: Research
File -
HELP4MOOD:A computational distributed system to support the treatment of patients with major depression
Matheson, C. & Wolters, M.
1/01/11 → 30/06/14
Project: Research
Activities
- 1 Invited talk
-
EACL 2014 keynote: Speech synthesis needs YOU!
Simon King (Speaker)
29 Apr 2014Activity: Academic talk or presentation types › Invited talk
File