Edinburgh Research Explorer

Embeddings for DNN speaker adaptive training

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
Publication statusAccepted/In press - 13 Sep 2019
EventIEEE Automatic Speech Recognition and Understanding Workshop 2019 - Sentosa, Singapore
Duration: 14 Dec 201918 Dec 2019
http://asru2019.org/wp/

Conference

ConferenceIEEE Automatic Speech Recognition and Understanding Workshop 2019
Abbreviated titleASRU 2019
CountrySingapore
CitySentosa
Period14/12/1918/12/19
Internet address

Abstract

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker. DNN-SAT can be viewed as learning a mapping from each embedding to transformation parameters that are applied to the shared parameters of the DNN. We investigate different approaches to applying these transformations, and find that with a good training strategy, a multi-layer adaptation network applied to all hidden layers is no more effective than a single linear layer acting on the embeddings to transform the input features. In the second part of our work, we evaluate different embeddings (i-vectors, x-vectors and deep CNN embeddings) in an additional speaker recognition task in order to gain insight into what should characterize an embedding for DNN-SAT. We find the performance for speaker recognition of a given representation is not correlated with its ASR performance; in fact, ability to capture more speech attributes than just speaker identity was the most important characteristic of the embeddings for efficient DNN-SAT ASR. Our best models achieved relative WER gains of 4% and 9% over DNN baselines using speaker-level cepstral mean normalisation (CMN), and a fully speaker-independent model, respectively.

    Research areas

  • speaker embeddings, utterance summary vectors, speaker adaptive training

Event

IEEE Automatic Speech Recognition and Understanding Workshop 2019

14/12/1918/12/19

Sentosa, Singapore

Event: Conference

ID: 118997013