Reconstructing Voices within the Multiple-Average-Voice-Model framework

Pierre Lanchantin, Christophe Veaux, Mark J F Gales, Simon King, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Personalisation of voice output communication aids (VOCAs) allows to preserve the vocal identity of people suffering from speech disorders. This can be achieved by the adaptation of HMM-based speech synthesis systems using a small amount of adaptation data. When the voice has begun to deteriorate, reconstruction is still possible in the statistical domain by correcting the parameters of the models associated with the speech disorder. This can be done by substituting those with parameters from a donor’s voice, at risk of losing part of the identity of the patient. Recently, the Multiple-Average-Voice-Model (Multiple AVM) framework has been proposed for speaker adaptation. Adaptation is performed via interpolation into a speaker eigenspace spanned by the mean vectors of speaker-adapted AVMs which can be tuned to the individual speaker. In this paper, we present the benefits of this framework for voice reconstruction: it requires only a very small amount of adaptation data, interpolation can be performed in a clean speech eigenspace and the resulting voice can be easily fine-tuned by acting on the interpolation weights. We illustrate our points with a subjective assessment of the reconstructed voice. Index Terms: HMM-Based speech synthesis, speaker adaptation, multiple average voice model, cluster adaptive training, voice reconstruction, voice output communication aids.
Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
Number of pages5
Publication statusPublished - Sept 2015
EventInterspeech 2015 - Dresden, Germany
Duration: 6 Sept 20159 Sept 2015


ConferenceInterspeech 2015


Dive into the research topics of 'Reconstructing Voices within the Multiple-Average-Voice-Model framework'. Together they form a unique fingerprint.

Cite this