DNN-based Speech Synthesis for Indian Languages from ASCII text

Srikanth Ronanki, Siva Reddy Gangireddy, Bajibabu Bollepalli, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Text-to-Speech synthesis in Indian languages has a seen lot of progress over the decade partly due to the annual Blizzard challenges. These systems assume the text to be written in Devanagari or Dravidian scripts which are nearly phonemic orthography scripts. However, the most common form of computer interaction among Indians is ASCII written transliterated text. Such text is generally noisy with many variations in spelling for the same word. In this paper we evaluate three approaches to synthesize speech from such noisy ASCII text: a naive UniGrapheme approach, a Multi-Grapheme approach, and a supervised Grapheme-to-Phoneme (G2P) approach. These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that. We train and test our models on Blizzard Challenge datasets that were transliterated to ASCII using crowdsourcing. Our experiments on Hindi, Tamil and Telugu demonstrate that our models generate speech of competetive quality from ASCII text compared to the speech synthesized from the native scripts. All the accompanying transliterated datasets are released for public access.
Original languageEnglish
Title of host publication9th ISCA Speech Synthesis Workshop
Pages74-79
Number of pages6
DOIs
Publication statusPublished - 15 Sept 2016
Event9th ISCA Speech Synthesis Workshop - Sunnyvale, United States
Duration: 13 Sept 201615 Sept 2016
http://ssw9.talp.cat/

Conference

Conference9th ISCA Speech Synthesis Workshop
Abbreviated titleISCA 2016
Country/TerritoryUnited States
CitySunnyvale
Period13/09/1615/09/16
Internet address

Fingerprint

Dive into the research topics of 'DNN-based Speech Synthesis for Indian Languages from ASCII text'. Together they form a unique fingerprint.

Cite this