TY - GEN
T1 - An unsupervised method to select a speaker subset from large multi-speaker speech synthesis datasets
AU - Gallegos, Pilar Oplustil
AU - Williams, Jennifer
AU - Rownicka, Joanna
AU - King, Simon
N1 - Funding Information: this work was supported in part by: ANID, Becas Chile, n?72190135; the EPSRC Centre for Doctoral Training in Data Science, funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016427/1) and the University of Edinburgh; and a PhD studentship from the DataLab Innovation Centre, Ericsson Media Services, and Quorate Technology.
PY - 2020/10/29
Y1 - 2020/10/29
N2 - Large multi-speaker datasets for TTS typically contain diverse speakers, recording conditions, styles and quality of data. Although one might generally presume that more data is better, in this paper we show that a model trained on a carefully-chosen subset of speakers from LibriTTS provides significantly better quality synthetic speech than a model trained on a larger set. We propose an unsupervised methodology to find this subset by clustering per-speaker acoustic representations.
AB - Large multi-speaker datasets for TTS typically contain diverse speakers, recording conditions, styles and quality of data. Although one might generally presume that more data is better, in this paper we show that a model trained on a carefully-chosen subset of speakers from LibriTTS provides significantly better quality synthetic speech than a model trained on a larger set. We propose an unsupervised methodology to find this subset by clustering per-speaker acoustic representations.
KW - clustering
KW - data
KW - multi-speaker
KW - sequence-to-sequence models
KW - speaker representation
KW - speech synthesis
U2 - 10.21437/Interspeech.2020-2567
DO - 10.21437/Interspeech.2020-2567
M3 - Conference contribution
AN - SCOPUS:85098189222
VL - 2020-October
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1758
EP - 1762
BT - Proceedings of the Annual Conference of the International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -