Edinburgh Research Explorer

Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)17-27
Number of pages11
Journal IEEE/ACM Transactions on Audio, Speech, and Language Processing
Volume22
Issue number1
DOIs
Publication statusPublished - 2014

Abstract

This paper studies cross-lingual acoustic modeling in the context of subspace Gaussian mixture models (SGMMs). SGMMs factorize the acoustic model parameters into a set that is globally shared between all the states of a hidden Markov model (HMM) and another that is specific to the HMM states. We demonstrate that the SGMM global parameters are transferable between languages, particularly when the parameters are trained multilingually. As a result, acoustic models may be trained using limited amounts of transcribed audio by borrowing the SGMM global parameters from one or more source languages, and only training the state-specific parameters on the target language audio. Model regularization using ℓ1-norm penalty is shown to be particularly effective at avoiding overtraining and leading to lower word error rates. We investigate maximum a posteriori (MAP) adaptation of subspace parameters in order to reduce the mismatch between the SGMM global parameters of the source and target languages. In addition, monolingual and cross-lingual speaker adaptive training is used to reduce the model variance introduced by speakers. We have systematically evaluated these techniques by experiments on the GlobalPhone corpus.

    Research areas

  • Gaussian processes, hidden Markov models, maximum likelihood estimation, speech recognition, ?1-norm penalty, GlobalPhone corpus, HMM states, MAP adaptation, SGMM global parameters, cross-lingual acoustic modeling, cross-lingual speaker adaptive training, cross-lingual subspace Gaussian mixture model, hidden Markov model, low-resource speech recognition, maximum a posteriori adaptation, model regularization, model variance reduction, monolingual speaker adaptive training, source languages, subspace parameters, target language audio, transcribed audio, word error rates, Acoustics, Adaptation models, Data models, Hidden Markov models, Speech, Speech recognition, Training data, Acoustic modeling, adaptation, cross-lingual speech recognition, regularization, subspace Gaussian mixture model

Download statistics

No data available

ID: 12330000