Abstract / Description of output
Sequence-to-sequence (S2S) TTS models like Tacotron have grapheme-only inputs when trained fully end-to-end. Grapheme inputs map to phone sounds depending on context, which traditionally is handled by extensive preprocessing in the TTS front-end. However, French orthography does not provide a clear one-to-one mapping between graphemes and sounds, and in English, which similarly has rather non-phonetic orthography, pronunciations are a significant cause of error in S2S- TTS with grapheme-inputs. In this paper, we test implicit pronunciation knowledge where graphemes do not map directly to phones. Implicit pronunciation knowledge learnt in S2S-TTS is similar to a standalone grapheme-to-phoneme (G2P) model, which makes explicit phone predictions at the sequential level. We find grapheme-input S2S-TTS makes implicit pronunciation errors similar to explicit G2P models - notably for foreign names. In a traditional front-end pipeline, there are also post-lexical rules which modify G2P output at the sequential level. In French, post-lexical rules require a deep knowledge of linguistic structure in a process called Liaison. Without explicit rules, we find S2S-TTS with grapheme-inputs over-inserts Liaison sounds, leading to a significant preference for a phone-based equivalent. By testing with linguistically-motivated stimuli, we observe differences that would otherwise go undetected.
Original language | English |
---|---|
Title of host publication | Proc. 11th ISCA Speech Synthesis Workshop (SSW 11) |
Pages | 195--199 |
DOIs | |
Publication status | Published - 28 Aug 2021 |
Event | The 11th ISCA Speech Synthesis Workshop (SSW11) - Gárdony, Hungary Duration: 26 Aug 2021 → 28 Aug 2021 Conference number: 11 https://ssw11.hte.hu |
Conference
Conference | The 11th ISCA Speech Synthesis Workshop (SSW11) |
---|---|
Abbreviated title | SSW11 |
Country/Territory | Hungary |
City | Gárdony |
Period | 26/08/21 → 28/08/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- text-to-speech
- phoneme
- liaison
- enchaînment