Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data from the target language. When used to generate pseudo-labels for semi-supervised training, we obtain WERs that range from 32.5% to just 1.9% absolute worse than the equivalent fully supervised models trained on the same data.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2022
EditorsHanseok Ko, John H. L. Hansen
Number of pages5
Publication statusPublished - 18 Sept 2022
EventInterspeech 2022 - Incheon, Korea, Republic of
Duration: 18 Sept 202222 Sept 2022
Conference number: 23


ConferenceInterspeech 2022
Country/TerritoryKorea, Republic of
Internet address

Keywords / Materials (for Non-textual outputs)

  • automatic speech recognition
  • cross-lingual transfer
  • decipherment
  • semi-supervised training


Dive into the research topics of 'Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR'. Together they form a unique fingerprint.

Cite this