Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

Liang Lu, A. Ghoshal, S. Renals

Research output: Contribution to journalArticlepeer-review

Abstract

This paper studies cross-lingual acoustic modeling in the context of subspace Gaussian mixture models (SGMMs). SGMMs factorize the acoustic model parameters into a set that is globally shared between all the states of a hidden Markov model (HMM) and another that is specific to the HMM states. We demonstrate that the SGMM global parameters are transferable between languages, particularly when the parameters are trained multilingually. As a result, acoustic models may be trained using limited amounts of transcribed audio by borrowing the SGMM global parameters from one or more source languages, and only training the state-specific parameters on the target language audio. Model regularization using ℓ1-norm penalty is shown to be particularly effective at avoiding overtraining and leading to lower word error rates. We investigate maximum a posteriori (MAP) adaptation of subspace parameters in order to reduce the mismatch between the SGMM global parameters of the source and target languages. In addition, monolingual and cross-lingual speaker adaptive training is used to reduce the model variance introduced by speakers. We have systematically evaluated these techniques by experiments on the GlobalPhone corpus.
Original languageEnglish
Pages (from-to)17-27
Number of pages11
JournalIEEE/ACM Transactions on Audio, Speech and Language Processing
Volume22
Issue number1
DOIs
Publication statusPublished - 2014

Keywords

  • Gaussian processes
  • hidden Markov models
  • maximum likelihood estimation
  • speech recognition
  • ?1-norm penalty
  • GlobalPhone corpus
  • HMM states
  • MAP adaptation
  • SGMM global parameters
  • cross-lingual acoustic modeling
  • cross-lingual speaker adaptive training
  • cross-lingual subspace Gaussian mixture model
  • hidden Markov model
  • low-resource speech recognition
  • maximum a posteriori adaptation
  • model regularization
  • model variance reduction
  • monolingual speaker adaptive training
  • source languages
  • subspace parameters
  • target language audio
  • transcribed audio
  • word error rates
  • Acoustics
  • Adaptation models
  • Data models
  • Hidden Markov models
  • Speech
  • Speech recognition
  • Training data
  • Acoustic modeling
  • adaptation
  • cross-lingual speech recognition
  • regularization
  • subspace Gaussian mixture model

Fingerprint Dive into the research topics of 'Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition'. Together they form a unique fingerprint.

Cite this