Template-Warping Based Speech Driven Head Motion Synthesis

David A. Braude, Hiroshi Shimodaira, Atef Ben Youssef

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a method for synthesising head motion from speech using a combination of an Input-Output Markov model (IOMM) and Gaussian mixture models trained in a supervised manner. A key difference of this approach compared to others is to model the head motion in each angle as a series of templates of motion rather than trying to recover a frame-wise function. The templates were chosen to re?ect natural patterns in the head motion, and states for the IOMM were chosen based on statistics of the templates. This reduces the search space for the trajectories and stops impossible motions such as discontinuities from being possible. For synthesis our system warps the templates to account for the acoustic features and the other angles? warping parameters. We show our system is capable of recovering the statistics of the motion that were chosen for the states. Our system was then compared to a baseline that used a frame-wise mapping that is based on previously published work. A subjective preference test that includes multiple speakers showed participants have a preference for the segment based approach. Both of these systems were trained on storytelling free speech.
Original languageEnglish
Title of host publicationInterspeech 2013
Subtitle of host publication14th Annual Conference of the International Speech Communication Association
Pages2763-2767
Number of pages5
Publication statusPublished - 1 Aug 2013
EventINTERSPEECH 2013 - 14thAnnual Conference of the International Speech Communication Association - Lyon, France
Duration: 25 Aug 201329 Aug 2013

Conference

ConferenceINTERSPEECH 2013 - 14thAnnual Conference of the International Speech Communication Association
CountryFrance
CityLyon
Period25/08/1329/08/13

Fingerprint Dive into the research topics of 'Template-Warping Based Speech Driven Head Motion Synthesis'. Together they form a unique fingerprint.

Cite this