Abstract
We describe a speaker-independent mel-cepstrum estimation system which accepts electromagnetic articulography (EMA) data as input. The system collects speaker information with d-vectors generated from the EMA data. We have also investigated the effect of speaker independence in the input vectors given to the mel-cepstrum estimator. This is accomplished by introducing a two-stage network, where the first stage is trained to output EMA sequences that are averaged across all speakers on a per-triphone basis (and so are speaker-independent) and the second receives these as input for mel-cepstrum estimation. Experimental results show that using the d-vectors can improve the performance of mel-cepstrum estimation by 0.19 dB with regard to mel-cepstrum distortion in the closed-speaker test set. Additionally, giving triphone-averaged EMA data to a mel-cepstrum estimator is shown to improve the performance by a further 0.16 dB, which indicates that the speaker-independent input has a positive effect on mel-cepstrum estimation.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association |
Place of Publication | Shanghai, China |
Publisher | ISCA |
Pages | 3176-3180 |
DOIs | |
Publication status | Published - 25 Oct 2020 |
Event | Interspeech 2020 - Virtual Conference, China Duration: 25 Oct 2020 → 29 Oct 2020 http://www.interspeech2020.org/ |
Publication series
Name | |
---|---|
Volume | 2020 |
ISSN (Print) | 1990-9772 |
Conference
Conference | Interspeech 2020 |
---|---|
Abbreviated title | INTERSPEECH 2020 |
Country/Territory | China |
City | Virtual Conference |
Period | 25/10/20 → 29/10/20 |
Internet address |