Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input

Kouichi Katsurada, Korin Richmond

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We describe a speaker-independent mel-cepstrum estimation system which accepts electromagnetic articulography (EMA) data as input. The system collects speaker information with d-vectors generated from the EMA data. We have also investigated the effect of speaker independence in the input vectors given to the mel-cepstrum estimator. This is accomplished by introducing a two-stage network, where the first stage is trained to output EMA sequences that are averaged across all speakers on a per-triphone basis (and so are speaker-independent) and the second receives these as input for mel-cepstrum estimation. Experimental results show that using the d-vectors can improve the performance of mel-cepstrum estimation by 0.19 dB with regard to mel-cepstrum distortion in the closed-speaker test set. Additionally, giving triphone-averaged EMA data to a mel-cepstrum estimator is shown to improve the performance by a further 0.16 dB, which indicates that the speaker-independent input has a positive effect on mel-cepstrum estimation.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Place of PublicationShanghai, China
Publication statusPublished - 25 Oct 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020

Publication series

ISSN (Print)1990-9772


ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
CityVirtual Conference
Internet address


Dive into the research topics of 'Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input'. Together they form a unique fingerprint.

Cite this