Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

John Dines, Hui Liang, Lakshmi Saheer, Matthew Gibson, William Byrne, Keiichiro Oura, Keiichi Tokuda, Junichi Yamagishi, Simon King, Mirjam Wester, Teemu Hirsimäki, Reima Karhila, Mikko Kurimo

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the user by way of speech recognition. In this work we present results of several different unsupervised and cross-lingual adaptation approaches as well as an end-to-end speaker adaptive speech-to-speech translation system. Our experiments show that we can successfully apply speaker adaptation in both unsupervised and cross-lingual scenarios and our proposed algorithms seem to generalise well for several language pairs. We also discuss important future directions including the need for better evaluation metrics.
Original languageEnglish
Pages (from-to)420-437
Number of pages18
JournalComputer Speech and Language
Volume27
Issue number2
Early online date17 Sept 2011
DOIs
Publication statusPublished - Feb 2013

Keywords / Materials (for Non-textual outputs)

  • Speech-to-speech translation
  • Cross-lingual speaker adaptation
  • HMM-based speech synthesis

Fingerprint

Dive into the research topics of 'Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis'. Together they form a unique fingerprint.

Cite this