Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop

K. Livescu, O C‡etin, M. Hasegawa-Johnson, S. King, C. Bartels, N. Borges, A. Kantor, Partha Lal, L. Yung, S. Bezman Dawson-Haggerty, B. Woods, J. Frankel, M. Magimai-Doss, K. Saenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We report on investigations, conducted at the 2006 Johns HopkinsWorkshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the tandem approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the Small-Vocabulary Switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classication and forced alignment using a newly collected set of feature-level manual transcriptions.
Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007)
Pages621-621
Number of pages4
Volume4
DOIs
Publication statusPublished - 1 Apr 2007

Fingerprint Dive into the research topics of 'Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop'. Together they form a unique fingerprint.

Cite this