Abstract / Description of output
In this paper we describe the architecture and design of a real-time text to speech (TTS) solution for the iPhone. As smart phones proliferate, new classes of user scenarios are becoming available- with eyes-free scenarios becoming one of the most important. In this paper we present a realtime solution which provides a high quality voice experience which works both disconnected and connected. In order that the solution remains cost effective to the user we present a novel approach that allows bandwidth and performance to be scaled to application requirements. Server side normalization and synthesis preselection resulted in a 100% performance improvement while decreasing the bandwidth ten-fold from that of standard telephone audio. The use of scalable server side pre-processing allows optional central control of text normalization allowing the synthesis system to deal with unseen pronunciations and new normalization patterns without requiring any handset software upgrades.
Original language | English |
---|---|
Number of pages | 4 |
Publication status | Published - 2010 |
Event | Fifth Workshop on Speech in Mobile and Pervasive Environments - Lisbon, Portugal Duration: 7 Sept 2010 → 7 Sept 2010 |
Workshop
Workshop | Fifth Workshop on Speech in Mobile and Pervasive Environments |
---|---|
Country/Territory | Portugal |
City | Lisbon |
Period | 7/09/10 → 7/09/10 |