Analysis of Extended Baum-Welch and Constrained Optimization for Discriminative Training of HMMs

J. Pylkkonen, M. Kurimo

Research output: Contribution to journalArticlepeer-review

Abstract

Discriminative training is an essential part in building a state-of-the-art speech recognition system. The Extended Baum–Welch (EBW) algorithm is the most popular method to carry out this demanding large-scale optimization task. This paper presents a novel analysis of the EBW algorithm which shows that EBW is performing a specific kind of constrained optimization. The constraints show an interesting connection between the improvement of the discriminative criterion and the Kullback–Leibler divergence (KLD). Based on the analysis, a novel method for controlling the EBW algorithm is proposed. The presented analysis uses decomposed formulae for Gaussian mixture KLDs which correspond to the ones used in the Constrained Line Search (CLS) optimization algorithm. The CLS algorithm for discriminative training is therefore also briefly presented and its connections to EBW studied. Large vocabulary speech recognition experiments are used to evaluate the proposed controlling of EBW, which is shown to outperform the common heuristics in model robustness. Comparison of EBW to CLS also shows differences in robustness in favor to EBW. The constraints for Gaussian parameter optimization as well as the special mixture weight estimation method used with EBW are shown to be the key factors for good performance.
Original languageEnglish
Pages (from-to)2409-2419
Number of pages11
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume20
Issue number9
DOIs
Publication statusPublished - 1 Nov 2012

Keywords

  • Acoustics
  • Algorithm design and analysis
  • Hidden Markov models
  • Optimization
  • Speech recognition
  • Training
  • Acoustic modeling
  • constrained optimization
  • discriminative training
  • extended Baum?Welch
  • speech recognition

Fingerprint

Dive into the research topics of 'Analysis of Extended Baum-Welch and Constrained Optimization for Discriminative Training of HMMs'. Together they form a unique fingerprint.

Cite this