Edinburgh Research Explorer

Using eigenvoices and nearest-neighbours in HMM-based cross-lingual speaker adaptation with limited data

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: (c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

    Accepted author manuscript, 677 KB, PDF-document

Original languageEnglish
Pages (from-to)839-851
Number of pages13
Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume25
Issue number4
Early online date13 Feb 2017
DOIs
Publication statusPublished - Apr 2017

Abstract

Cross-lingual speaker adaptation for speech synthesis has many applications, such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis systems using limited adaptation data. To that end, we propose two eigenvoice adaptation approaches exploiting a bilingual Turkish-English speech database that we collected. In one approach, eigenvoice weights extracted using Turkish adaptation data and Turkish voice models are transformed into the eigenvoice weights for the English voice models using linear regression. Weighting the samples depending on the distance of reference speakers to target speakers during linear regression was found to improve the performance. Moreover, importance weighting the elements of the eigenvectors during regression further improved the performance. The second approach proposed here is speaker-specific state-mapping which performed significantly better than the baseline state-mapping algorithm both in objective and subjective tests. Performance of the proposed state mapping algorithm was further improved when it was used with the intra-lingual eigenvoice approach instead of the linear-regression based algorithms used in the baseline system.

    Research areas

  • statistical speech synthesis, speaker adaptation, nearest neighbour, cross lingual speaker adaptation

Download statistics

No data available

ID: 30390531