Edinburgh Research Explorer

Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • K. Livescu
  • O C‡etin
  • M. Hasegawa-Johnson
  • S. King
  • C. Bartels
  • N. Borges
  • A. Kantor
  • Partha Lal
  • L. Yung
  • S. Bezman Dawson-Haggerty
  • B. Woods
  • J. Frankel
  • M. Magimai-Doss
  • K. Saenko

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: © Livescu, K., C‡etin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., ... Saenko, K. (2007). Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007). (Vol. 4, pp. 621-621). 10.1109/ICASSP.2007.366989

    Accepted author manuscript, 140 KB, PDF-document

Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007)
Pages621-621
Number of pages4
Volume4
DOIs
Publication statusPublished - 1 Apr 2007

Abstract

We report on investigations, conducted at the 2006 Johns HopkinsWorkshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the tandem approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the Small-Vocabulary Switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classication and forced alignment using a newly collected set of feature-level manual transcriptions.

Download statistics

No data available

ID: 2076426