Using Prosodic Features in Language Models for Meetings

Songfang Huang, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingChapter


Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representation for each word. Two modeling approaches are then investigated. One is based on a factored language model, which directly uses the prosodic representation and treats it as a ‘word’. Instead of direct association, the second approach provides a richer probabilistic structure within a hierarchical Bayesian framework by introducing an intermediate latent variable to represent similar prosodic patterns shared by groups of words. Four-fold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate
Original languageEnglish
Title of host publicationMachine Learning for Multimodal Interaction
Subtitle of host publication4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers
EditorsAndrei Popescu-Belis, Steve Renals, Hervé Bourlard
Place of PublicationBerlin, Heidelberg
PublisherSpringer Berlin Heidelberg
Number of pages12
ISBN (Electronic)978-3-540-78155-4
ISBN (Print)978-3-540-78154-7
Publication statusPublished - 2008
Event4th International Workshop MLMI 2007 - Brno, Czech Republic
Duration: 28 Jun 200730 Jun 2007

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Berlin Heidelberg
ISSN (Print)0302-9743


Workshop4th International Workshop MLMI 2007
Country/TerritoryCzech Republic


Dive into the research topics of 'Using Prosodic Features in Language Models for Meetings'. Together they form a unique fingerprint.

Cite this