Using facial feedback to enhance turn-taking in a multimodal dialogue system

Michael White, Mary Ellen Foster, Jon Oberlander, Ash Brown

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the results of an experiment investigating whether an avatar’s facial feedback can enhance turn-taking, undertaken as part of a usability study of a preliminary version of the COMIC multimodal dialogue system. The study focused on the phase of the interaction where the avatar embodies a virtual sales agent that guides the user through a range of possible tiling options for his or her newly redesigned bathroom. Our experiment employed a between-subjects design, where subjects used the system in one of two face conditions: (1) the “expressive” condition, where lip sync, blinking, facial expressions, gaze shifting and head turning were enabled; or (2) the “zombie” condition, where only lip sync was enabled. The results of the study were mixed, with some positive results on improving the interaction quality, but some unexpected negative results on task success and ease. On the positive side, the responses to our questionnaire indicated that the avatar’s thinking expression helped to convey that the system was busy processing input—confirming Edlund and Nordstrand’s (2002) finding—and that the facial expressions mitigated the system’s perceived sluggishness in responding verbally. However, after examining the videos of the interactions, we concluded that the avatar’s facial feedback—though helpful with some users—was unlikely to make up for the unnaturalness of the system’s half-duplex interaction on its own, and thus should be used together with explicit signals such as busy cursors. We did also find that the subjects in the expressive condition looked back at the avatar significantly more often than those in the zombie condition—confirming the results of Sidner et al. (2004)—but it was unclear whether this had any impact on turn-taking. Interestingly, recent research by de Ruiter (2005) revealed no systematic relationship between other-gaze and turn-taking in human-human dialogues involving relevant external visual representations, so in retrospect the absence of any significant impact of the avatar’s looking behaviour on turn-taking should perhaps be expected. With task success and ease, we were surprised to find that the subjects in the zombie condition scored significantly higher on several of our objective and perceived measures. One reason for the negative impact of the expressive face on task success and ease may have been that the expressive face distracted subjects from the task. Another possibility is that the expressive face raised users’ expectations of the system’s abilities, thereby encouraging subjects to use voice input rather than the mouse, which was generally a less successful strategy. We plan to investigate further with the final version of the system.
Original languageEnglish
Title of host publicationPROCEEDINGS OF HCI INTERNATIONAL 2005, LAS VEGAS
Publication statusPublished - 2005
Event11th International Conference on Human-Computer Interaction (HCI 2005) - Las Vegas, NV, United States
Duration: 22 Jul 200527 Jul 2005

Conference

Conference11th International Conference on Human-Computer Interaction (HCI 2005)
Country/TerritoryUnited States
CityLas Vegas, NV
Period22/07/0527/07/05

Keywords / Materials (for Non-textual outputs)

  • multimodal dialogue system
  • facial feedback
  • facial expression
  • avatar facial feedback
  • expressive condition
  • zombie condition
  • virtual sale agent
  • possible tiling option
  • lip sync
  • perceived sluggishness
  • preliminary version
  • busy cursor
  • interaction quality
  • face condition
  • head turning
  • between-subjects design positive result
  • gaze shifting
  • system half-duplex interaction
  • positive side
  • explicit signal
  • comic multimodal dialogue system
  • task success
  • unexpected negative result
  • usability study

Fingerprint

Dive into the research topics of 'Using facial feedback to enhance turn-taking in a multimodal dialogue system'. Together they form a unique fingerprint.

Cite this