Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model

Herman Kamper, Aren Jansen, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Current supervised speech technology relies heavily on transcribed speech and pronunciation dictionaries. In settings where unlabelled speech data alone is available, unsupervised methods are required to discover categorical linguistic structure directly from the audio. We present a novel Bayesian model which segments unlabelled input speech into word-like units, resulting in a complete unsupervised transcription of the speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional space; the model (implemented as a Gibbs sampler) then builds a whole-word acoustic model in this space while jointly doing segmentation. We report word error rates in a connected digit recognition task by mapping the unsupervised output to ground truth transcriptions. Our model outperforms a previously developed HMM-based system, even when the model is not constrained to discover only the 11 word types present in the data.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2015
Publication statusPublished - 2015


Dive into the research topics of 'Fully Unsupervised Small-Vocabulary Speech Recognition Using a Segmental Bayesian Model'. Together they form a unique fingerprint.

Cite this