Noisy speech database for training speech enhancement algorithms and TTS models

Dataset

Abstract

Clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the papers associated with the database. For the 28 speaker dataset, details can be found in: C. Valentini-Botinhao, X. Wang, S. Takaki & J. Yamagishi, "Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System using Deep Recurrent Neural Networks", In Proc. Interspeech 2016. For the 56 speaker dataset: C. Valentini-Botinhao, X. Wang, S. Takaki & J. Yamagishi, "Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech”, In Proc. SSW 2016. Some of the noises used to create the noisy speech were obtained from the Demand database, available here: http://parole.loria.fr/DEMAND/ . The speech database was obtained from the CSTR VCTK Corpus, available here: http://dx.doi.org/10.7488/ds/1994. The speech-shaped and babble noise files that were used to create this dataset are available here: http://homepages.inf.ed.ac.uk/cvbotinh/se/noises/.

Data Citation

Valentini-Botinhao, Cassia. (2017). Noisy speech database for training speech enhancement algorithms and TTS models, 2016 [sound]. University of Edinburgh. School of Informatics. Centre for Speech Technology Research (CSTR). http://dx.doi.org/10.7488/ds/2117.
Date made available21 Aug 2017
PublisherEdinburgh DataShare
Geographical coverageUnited Kingdom

Cite this