Scalable Mobile Implementation of High Quality Real Time Text to Speech Synthesis

Matthew Aylett, Timothy Kimball, Glenn Andert

Research output: Contribution to conferencePaperpeer-review

Abstract / Description of output

In this paper we describe the architecture and design of a real-time text to speech (TTS) solution for the iPhone. As smart phones proliferate, new classes of user scenarios are becoming available- with eyes-free scenarios becoming one of the most important. In this paper we present a realtime solution which provides a high quality voice experience which works both disconnected and connected. In order that the solution remains cost effective to the user we present a novel approach that allows bandwidth and performance to be scaled to application requirements. Server side normalization and synthesis preselection resulted in a 100% performance improvement while decreasing the bandwidth ten-fold from that of standard telephone audio. The use of scalable server side pre-processing allows optional central control of text normalization allowing the synthesis system to deal with unseen pronunciations and new normalization patterns without requiring any handset software upgrades.
Original languageEnglish
Number of pages4
Publication statusPublished - 2010
EventFifth Workshop on Speech in Mobile and Pervasive Environments - Lisbon, Portugal
Duration: 7 Sept 20107 Sept 2010

Workshop

WorkshopFifth Workshop on Speech in Mobile and Pervasive Environments
Country/TerritoryPortugal
CityLisbon
Period7/09/107/09/10

Fingerprint

Dive into the research topics of 'Scalable Mobile Implementation of High Quality Real Time Text to Speech Synthesis'. Together they form a unique fingerprint.

Cite this