Cyborg Speech: Deep Multilingual Speech Synthesis For Generating Segmental Foreign Accent With Natural Prosody

Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junichi Yamagishi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm “cyborg speech” as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quinphone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.
Index Terms— Multilingual speech synthesis, phonetic manipulation, foreign accent, DNN
Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subtitle of host publicationCalgary, AB, Canada
Place of PublicationCalgary, Alberta, Canada
PublisherInstitute of Electrical and Electronics Engineers
Pages4799-4803
Number of pages5
ISBN (Electronic)978-1-5386-4658-8
ISBN (Print)978-1-5386-4659-5
DOIs
Publication statusPublished - 13 Sept 2018
Event2018 IEEE International Conference on Acoustics, Speech and Signal Processing - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018
https://2018.ieeeicassp.org/
https://2018.ieeeicassp.org/default.asp

Publication series

Name
PublisherIEEE
ISSN (Electronic)2379-190X

Conference

Conference2018 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18
Internet address

Fingerprint

Dive into the research topics of 'Cyborg Speech: Deep Multilingual Speech Synthesis For Generating Segmental Foreign Accent With Natural Prosody'. Together they form a unique fingerprint.

Cite this