Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

Keiichiro Oura, Keiichi Tokuda, Junichi Yamagishi, Simon King, Mirjam Wester

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user's spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user's voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a word-based large-vocabulary continuous speech recognizer and cross-lingual speaker adaptation for HMM-based TTS, into a single architecture. Thus, an unsupervised cross-lingual speaker adaptation system can be developed. Listening tests show very promising results, demonstrating that adapted voices sound similar to the target speaker and that differences between supervised and unsupervised cross-lingual speaker adaptation are small.
Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherIEEE
Pages4594-4597
Number of pages4
ISBN (Print)9781424442959
DOIs
Publication statusPublished - 28 Jun 2010
Event2010 IEEE International Conference on Acoustics, Speech and Signal Processing - Dallas, TX, USA
Duration: 14 Mar 201019 Mar 2010

Publication series

NameIEEE International Conference on Acoustics, Speech and Signal Processing
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference2010 IEEE International Conference on Acoustics, Speech and Signal Processing
Period14/03/1019/03/10

Keywords / Materials (for Non-textual outputs)

  • speech synthesis
  • hidden Markov models
  • natural languages
  • automatic speech recognition
  • loudspeakers
  • speech recognition
  • decision Trees
  • speech analysis
  • databases
  • computer science
  • HMM-based speech synthesis
  • unsupervised cross-lingual speaker adaptation

Fingerprint

Dive into the research topics of 'Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis'. Together they form a unique fingerprint.

Cite this