Articulatory features for speech-driven head motion synthesis

Atef Ben Youssef, Hiroshi Shimodaira, David A. Braude

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy which have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features give higher correlations with the original head motion than when only prosodic features are used.
Original languageEnglish
Title of host publicationProc. Interspeech
Pages2758-2762
Number of pages5
Publication statusPublished - 1 Aug 2013

Cite this