Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis

Kathrin Haag, Hiroshi Shimodaira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our proposed DNN architecture outperforms conventional
feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
Original languageEnglish
Title of host publicationIntelligent Virtual Agents
Subtitle of host publication16th International Conference, IVA 2016, Los Angeles, CA, USA, September 20--23, 2016, Proceedings
EditorsDavid Traum, William Swartout, Peter Khooshabeh, Stefan Kopp, Stefan Scherer, Anton Leuski
Place of PublicationCham
PublisherSpringer International Publishing
Number of pages10
ISBN (Electronic)978-3-319-47665-0
ISBN (Print)978-3-319-47664-3
Publication statusPublished - 19 Oct 2016
Event16th International Conference on Intelligent Virtual Agents - Los Angeles, United States
Duration: 20 Sept 201623 Sept 2016

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing
ISSN (Print)0302-9743


Conference16th International Conference on Intelligent Virtual Agents
Abbreviated titleIVA 2016
Country/TerritoryUnited States
CityLos Angeles
Internet address


Dive into the research topics of 'Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis'. Together they form a unique fingerprint.

Cite this