Reactive Control of Expressive Speech Synthesis Using Kinect Skeleton Tracking

Robert Clark, Magdalena Anna Konkiewicz, Maria Astrinaki, Junichi Yamagishi

Research output: Contribution to journalArticlepeer-review

Abstract

Naturally expressive speech is important for an increasing number of real world speech synthesis appli-cations including augmentative and alternative communication aids and entertainment based applications. One of the important challenges facing speech synthesis development today is how to produce reactive expressive speech, that is speech where various aspects of the way in which speech is said can be controlled in real-time as the speech is produced. This is both a challenge in terms of the adaptability and latency of speech synthesis systems and in terms of how to provide a control mechanism for different situations. To explore these issues and generally raise awareness of these issues we present a reactive speech synthesiser where pitch and duration are controlled by hand movement via the skeleton tracking of a Microsoft Kinect sensor. We see that the manipulation of pitch and duration in realtime is possible (and fun), but it is difficult to produce meaningful expressiveness without an underlying model to allow a high-level representation of expressiveness to be used.
Original languageEnglish
Pages (from-to)175-178
Number of pages4
JournalIEICE technical report. Speech
Volume112
Issue number369
Publication statusPublished - 1 Dec 2012

Fingerprint

Dive into the research topics of 'Reactive Control of Expressive Speech Synthesis Using Kinect Skeleton Tracking'. Together they form a unique fingerprint.

Cite this