Edinburgh Research Explorer

Applying Vocal Tract Length Normalization to Meeting Recordings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the 9th European Conference on Speech Communication and Technology
Subtitle of host publicationInterspeech'2005 - Eurospeech
PublisherISCA
Pages265-268
Number of pages2
Publication statusPublished - 2005
Event9th European Conference on Speech Communication and Technology (Interspeech 2005 - Eurospeech) - Lisbon, Portugal
Duration: 4 Sep 20058 Sep 2005

Conference

Conference9th European Conference on Speech Communication and Technology (Interspeech 2005 - Eurospeech)
CountryPortugal
CityLisbon
Period4/09/058/09/05

Abstract

Vocal Tract Length Normalisation (VTLN) is a commonly used technique to normalise for inter-speaker variability. It is based on the speaker-specific warping of the frequency axis, parameterised by a scalar warp factor. This factor is typically estimated using maximum likelihood. We discuss how VTLN may be applied to multiparty conversations, reporting a substantial decrease in word error rate in experiments using the ICSI meetings corpus. We investigate the behaviour of the VTLN warping factor and show that a stable estimate is not obtained. Instead it appears to be influenced by the context of the meeting, in particular the current conversational partner. These results are consistent with predictions made by the psycholinguistic interactive alignment account of dialogue, when applied at the acoustic and phonological levels.

Download statistics

No data available

ID: 27399271