Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis

Kathrin Haag, Hiroshi Shimodaira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our proposed DNN architecture outperforms conventional
feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
Original languageEnglish
Title of host publicationIntelligent Virtual Agents
Subtitle of host publication16th International Conference, IVA 2016, Los Angeles, CA, USA, September 20--23, 2016, Proceedings
EditorsDavid Traum, William Swartout, Peter Khooshabeh, Stefan Kopp, Stefan Scherer, Anton Leuski
Place of PublicationCham
PublisherSpringer International Publishing
Pages198-207
Number of pages10
ISBN (Electronic)978-3-319-47665-0
ISBN (Print)978-3-319-47664-3
DOIs
Publication statusPublished - 19 Oct 2016
Event16th International Conference on Intelligent Virtual Agents - Los Angeles, United States
Duration: 20 Sep 201623 Sep 2016
http://iva2016.ict.usc.edu/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing
Volume10011
ISSN (Print)0302-9743

Conference

Conference16th International Conference on Intelligent Virtual Agents
Abbreviated titleIVA 2016
Country/TerritoryUnited States
CityLos Angeles
Period20/09/1623/09/16
Internet address

Fingerprint

Dive into the research topics of 'Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis'. Together they form a unique fingerprint.

Cite this