Abstract / Description of output
Previous work in speech-driven head motion synthesis is centred around Hidden Markov Model (HMM) based methods and data that does not show a large variability of expressiveness in both speech and motion. When using expressive data, these systems often fail to produce satisfactory results. Recent studies have shown that using deep neural networks (DNNs) results in a better synthesis of head motion, in particular when employing bidirectional long short-term memory (BLSTM). We present a novel approach which makes use of DNNs with stacked bottleneck features combined with a BLSTM architecture to model context and expressive variability. Our proposed DNN architecture outperforms conventional
feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
feed-forward DNNs and simple BLSTM networks in an objective evaluation. Results from a subjective evaluation show a significant improvement of the bottleneck architecture over feed-forward DNNs.
Original language | English |
---|---|
Title of host publication | Intelligent Virtual Agents |
Subtitle of host publication | 16th International Conference, IVA 2016, Los Angeles, CA, USA, September 20--23, 2016, Proceedings |
Editors | David Traum, William Swartout, Peter Khooshabeh, Stefan Kopp, Stefan Scherer, Anton Leuski |
Place of Publication | Cham |
Publisher | Springer |
Pages | 198-207 |
Number of pages | 10 |
ISBN (Electronic) | 978-3-319-47665-0 |
ISBN (Print) | 978-3-319-47664-3 |
DOIs | |
Publication status | Published - 19 Oct 2016 |
Event | 16th International Conference on Intelligent Virtual Agents - Los Angeles, United States Duration: 20 Sept 2016 → 23 Sept 2016 http://iva2016.ict.usc.edu/ |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer International Publishing |
Volume | 10011 |
ISSN (Print) | 0302-9743 |
Conference
Conference | 16th International Conference on Intelligent Virtual Agents |
---|---|
Abbreviated title | IVA 2016 |
Country/Territory | United States |
City | Los Angeles |
Period | 20/09/16 → 23/09/16 |
Internet address |
Fingerprint
Dive into the research topics of 'Bidirectional LSTM Networks Employing Stacked Bottleneck Features for Expressive Speech-Driven Head Motion Synthesis'. Together they form a unique fingerprint.Profiles
-
Hiroshi Shimodaira
- School of Informatics - Senior Lecturer
- Institute of Language, Cognition and Computation
- Centre for Speech Technology Research
- Language, Interaction, and Robotics
Person: Academic: Research Active