Stochastic pronunciation modeling for out-of-vocabulary spoken term detection

Dong Wang, Simon King, Joe Frankel

Research output: Contribution to journalArticlepeer-review

Abstract

Spoken term detection (STD) is the name given to the task of searching large amounts of audio for occurrences of spoken terms, which are typically single words or short phrases. One reason that STD is a hard task is that search terms tend to contain a disproportionate number of out-of-vocabulary (OOV) words. The most common approach to STD uses subword units. This, in conjunction with some method for predicting pronunciations of OOVs from their written form, enables the detection of OOV terms but performance is considerably worse than for in-vocabulary terms. This performance differential can be largely attributed to the special properties of OOVs. One such property is the high degree of uncertainty in the pronunciation of OOVs. We present a stochastic pronunciation model (SPM) which explicitly deals with this uncertainty. The key insight is to search for all possible pronunciations when detecting an OOV term, explicitly capturing the uncertainty in pronunciation. This requires a probabilistic model of pronunciation, able to estimate a distribution over all possible pronunciations. We use a joint-multigram model (JMM) for this and compare the JMM-based SPM with the conventional soft match approach. Experiments using speech from the meetings domain demonstrate that the SPM performs better than soft match in most operating regions, especially at low false alarm probabilities. Furthermore, SPM and soft match are found to be complementary: their combination provides further performance gains.
Original languageEnglish
Article number5510125
Pages (from-to)688-698
Number of pages11
JournalIEEE Transactions on Audio, Speech, and Language Processing
Volume19
Issue number4
Early online date15 Jul 2010
DOIs
Publication statusPublished - 1 May 2011

Keywords / Materials (for Non-textual outputs)

  • stochastic processes
  • lattices
  • scanning probe microscopy
  • tellurium
  • uncertainty
  • continuous-stirred tank reactor
  • acoustic signal detection
  • speech recognition
  • NIST
  • automatic speech recognition
  • letter-to-sound
  • out-of-vocabulary (OOV)
  • pronunciation modeling
  • spoken term detection (STD)

Fingerprint

Dive into the research topics of 'Stochastic pronunciation modeling for out-of-vocabulary spoken term detection'. Together they form a unique fingerprint.

Cite this