Transcription of multi-genre media archives using out-of-domain data

P.J. Bell, M.J.F. Gales, P. Lanchantin, X. Liu, Y. Long, S. Renals, Pawel Swietojanski, P.C. Woodland

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.
Original languageEnglish
Title of host publicationSpoken Language Technology Workshop (SLT), 2012 IEEE
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages324-329
Number of pages6
ISBN (Electronic)978-1-4673-5124-9
ISBN (Print)978-1-4673-5125-6
DOIs
Publication statusPublished - 2012

Keywords

  • hidden Markov models
  • information retrieval systems
  • records management
  • speech recognition
  • MLAN
  • deep neural networks
  • hidden Markov model
  • in-domain tandem features
  • multigenre media archives transcription
  • multilevel adaptive networks
  • out-of-domain posterior features
  • relative WER reductions
  • speech recognition system
  • tandem HMM
  • Acoustics
  • Adaptation models
  • Hidden Markov models
  • Neural networks
  • Speech
  • Training
  • Training data
  • cross-domain adaptation
  • media archives
  • tandem

Fingerprint Dive into the research topics of 'Transcription of multi-genre media archives using out-of-domain data'. Together they form a unique fingerprint.

Cite this