Abstract
Naturally expressive speech is important for an increasing number of real world speech synthesis appli-cations including augmentative and alternative communication aids and entertainment based applications. One of the important challenges facing speech synthesis development today is how to produce reactive expressive speech, that is speech where various aspects of the way in which speech is said can be controlled in real-time as the speech is produced. This is both a challenge in terms of the adaptability and latency of speech synthesis systems and in terms of how to provide a control mechanism for different situations. To explore these issues and generally raise awareness of these issues we present a reactive speech synthesiser where pitch and duration are controlled by hand movement via the skeleton tracking of a Microsoft Kinect sensor. We see that the manipulation of pitch and duration in realtime is possible (and fun), but it is difficult to produce meaningful expressiveness without an underlying model to allow a high-level representation of expressiveness to be used.
Original language | English |
---|---|
Pages (from-to) | 175-178 |
Number of pages | 4 |
Journal | IEICE technical report. Speech |
Volume | 112 |
Issue number | 369 |
Publication status | Published - 1 Dec 2012 |