Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input

Kouichi Katsurada, Korin Richmond

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a speaker-independent mel-cepstrum estimation system which accepts electromagnetic articulography (EMA) data as input. The system collects speaker information with d-vectors generated from the EMA data. We have also investigated the effect of speaker independence in the input vectors given to the mel-cepstrum estimator. This is accomplished by introducing a two-stage network, where the first stage is trained to output EMA sequences that are averaged across all speakers on a per-triphone basis (and so are speaker-independent) and the second receives these as input for mel-cepstrum estimation. Experimental results show that using the d-vectors can improve the performance of mel-cepstrum estimation by 0.19 dB with regard to mel-cepstrum distortion in the closed-speaker test set. Additionally, giving triphone-averaged EMA data to a mel-cepstrum estimator is shown to improve the performance by a further 0.16 dB, which indicates that the speaker-independent input has a positive effect on mel-cepstrum estimation.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association
Place of PublicationShanghai, China
PublisherISCA
Pages3176-3180
DOIs
Publication statusPublished - 25 Oct 2020
EventInterspeech 2020 - Virtual Conference, China
Duration: 25 Oct 202029 Oct 2020
http://www.interspeech2020.org/

Publication series

Name
Volume2020
ISSN (Print)1990-9772

Conference

ConferenceInterspeech 2020
Abbreviated titleINTERSPEECH 2020
Country/TerritoryChina
CityVirtual Conference
Period25/10/2029/10/20
Internet address

Fingerprint

Dive into the research topics of 'Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input'. Together they form a unique fingerprint.

Cite this