Abstract / Description of output
Neural sequence-to-sequence (S2S) models for text-tospeech synthesis (TTS) may take letter or phone input sequences. Since for many languages phones have a more direct relationship to the acoustic signal, they lead to improved quality. But generating phone transcriptions from text requires an expensive dictionary and an error-prone grapheme-to-phoneme (G2P) model, and the relative improvement over using letters has yet to be quantified. In approaching this question, we presume that letter-input S2S models must implicitly learn an internal counterpart to G2P conversion and therefore inevitably make errors. Such a model may thus be viewed as phone-input S2S with inaccurate phone input. To quantify this inaccuracy, we compare in this paper a letter-input S2S system to several phone-input systems trained on data with a varying level of error in the phonetic transcription. Our findings show our letterinput system is equivalent in quality to the phone-input system in which 25% of word tokens in the training data have incorrect phonetic transcriptions. Furthermore, we find that for phoneinput systems up to 15% of word tokens in the training data can have incorrect phonetic transcriptions without any significant difference in performance to a 0% error rate system. This suggests it is acceptable to use G2P to predict pronunciations for out-of-vocabulary words (OOVs) provided they are less than around 15% of the training data, removing the need to manually add OOVs to the dictionary for every new training set.
Original language | English |
---|---|
Title of host publication | Proceedings of the 10th ISCA Speech Synthesis Workshop |
Publisher | International Speech Communication Association |
Pages | 223-227 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 20 Sept 2019 |
Event | The 10th ISCA Speech Synthesis Workshop - Austrian museum of folk life and folk art in Vienna, Vienna, Austria Duration: 20 Sept 2019 → 22 Sept 2019 Conference number: 10 http://ssw10.oeaw.ac.at/index.html |
Publication series
Name | |
---|---|
Publisher | ISCA |
ISSN (Electronic) | 2312-2846 |
Conference
Conference | The 10th ISCA Speech Synthesis Workshop |
---|---|
Abbreviated title | SSW 2019 |
Country/Territory | Austria |
City | Vienna |
Period | 20/09/19 → 22/09/19 |
Internet address |