Projects per year
Abstract
Quality of text-to-speech voices built from noisy recordings is diminished. In order to improve it we propose the use of a recurrent neural network to enhance acoustic parameters prior to training. We trained a deep recurrent neural network using a parallel database of noisy and clean acoustics parameters as input and output of the network. The database consisted of multiple speakers and diverse noise conditions. We investigated using text-derived features as an additional input of the network. We processed a noisy database of two other speakers using this network and used its output to train an HMM acoustic text-to-synthesis model for each voice. Listening experiment results showed that the voice built with enhanced parameters was ranked significantly higher than the ones trained with noisy speech and speech that has been enhanced using a conventional enhancement system. The text-derived features improved results only for the female voice, where it was ranked as highly as a voice trained with clean speech.
Original language | English |
---|---|
Title of host publication | Proceedings of Interspeech 2016 |
Place of Publication | San Francisco, United States |
Pages | 352-356 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 8 Sept 2016 |
Event | Interspeech 2016 - San Francisco, United States Duration: 8 Sept 2016 → 12 Sept 2016 http://www.interspeech2016.org/ |
Publication series
Name | Interspeech |
---|---|
Publisher | International Speech Communication Association |
ISSN (Print) | 1990-9772 |
Conference
Conference | Interspeech 2016 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 8/09/16 → 12/09/16 |
Internet address |
Fingerprint
Dive into the research topics of 'Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks'. Together they form a unique fingerprint.Projects
- 2 Finished
-
User Generated Dialogue System: uDialogue
Renals, S. (Principal Investigator) & Yamagishi, J. (Co-investigator)
Non-EU industry, commerce and public corporations
1/04/16 → 31/03/17
Project: Research
-
Natural Speech Technology
Renals, S. (Principal Investigator) & King, S. (Co-investigator)
1/05/11 → 31/07/16
Project: Research
Datasets
-
Noisy speech database for training speech enhancement algorithms and TTS models
Valentini Botinhao, C. (Creator), Edinburgh DataShare, 21 Aug 2017
DOI: 10.7488/ds/2117
Dataset
-
Reverberant speech database for training speech dereverberation algorithms and TTS models
Valentini Botinhao, C. (Creator), Edinburgh DataShare, 22 Mar 2016
DOI: 10.7488/ds/1425
Dataset