Stochastic Pronunciation Modelling and Soft Match for Out-of-vocabulary Spoken Term Detection

Dong Wang, Simon King, Joe Frankel, Peter Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A major challenge faced by a spoken term detection (STD) system is the detection of out-of-vocabulary (OOV) terms. Although a subword-based STD system is able to detect OOV terms, performance reduction is always observed compared to in-vocabulary terms. One challenge that OOV terms bring to STD is the pronunciation uncertainty. A commonly used approach to address this problem is a soft matching procedure,and the other is the stochastic pronunciation modelling (SPM) proposed by the authors. In this paper we compare these two approaches, and combine them using a discriminative decision strategy. Experimental results demonstrated that SPM and soft match are highly complementary, and their combination gives significant performance improvement to OOV term detection.
Original languageEnglish
Title of host publicationProceedings of the 2010 IEEE International conference on Acoustic Speech and Signal Processing (ICASSP)
Place of PublicationNEW YORK
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5294-5297
Number of pages4
ISBN (Electronic)978-1-4244-4296-6
ISBN (Print)978-1-4244-4295-9
DOIs
Publication statusPublished - 1 Mar 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing - Dallas
Duration: 14 Mar 201019 Mar 2010

Publication series

NameIEEE International Conference on Acoustics Speech and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing
CityDallas
Period14/03/1019/03/10

Keywords

  • confidence estimation
  • soft match
  • speech recognition
  • spoken term detection
  • stochastic pronunciation modelling

Fingerprint

Dive into the research topics of 'Stochastic Pronunciation Modelling and Soft Match for Out-of-vocabulary Spoken Term Detection'. Together they form a unique fingerprint.

Cite this