Augmentation of adaptation data

Ravichander Vipperla, Steve Renals, Joe Frankel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the accuracy of the speaker adapted models is often worse compared with speaker independent models. In this paper, we propose an approach to select a set of reference speakers acoustically close to the target speaker whose data can be used to augment the adaptation data. To determine the acoustic similarity of two speakers, we propose a distance metric based on transforming sample points in the acoustic space with the regression matrices of the two speakers. We show the validity of this approach through a speaker identification task. ASR results on SCOTUS and AMI corpora with limited adaptation data of 10 to 15 seconds augmented by data from selected reference speakers show a significant improvement in Word Error Rate over speaker independent and speaker adapted models.
Original languageEnglish
Title of host publicationINTERSPEECH 2010 11th Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
Pages530-533
Number of pages4
Publication statusPublished - 2010

Fingerprint Dive into the research topics of 'Augmentation of adaptation data'. Together they form a unique fingerprint.

Cite this