Projects per year
We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic knowledge of the language in question. Our approach uses a novel application of a decipherment algorithm, which operates given only unpaired speech and text data from the target language. We apply this decipherment to phone sequences generated by a universal phone recogniser trained on out-of-language speech corpora, which we follow with flat-start semi-supervised training to obtain an acoustic model for the new language. To the best of our knowledge, this is the first practical approach to zero-resource cross-lingual ASR which does not rely on any hand-crafted phonetic information. We carry out experiments on read speech from the GlobalPhone corpus, and show that it is possible to learn a decipherment model on just 20 minutes of data from the target language. When used to generate pseudo-labels for semi-supervised training, we obtain WERs that range from 32.5% to just 1.9% absolute worse than the equivalent fully supervised models trained on the same data.
|Title of host publication||Proceedings of Interspeech 2022|
|Editors||Hanseok Ko, John H. L. Hansen|
|Number of pages||5|
|Publication status||Published - 18 Sep 2022|
|Event||Interspeech 2022 - Incheon, Korea, Democratic People's Republic of|
Duration: 18 Sep 2022 → 22 Sep 2022
Conference number: 23
|Country/Territory||Korea, Democratic People's Republic of|
|Period||18/09/22 → 22/09/22|
- automatic speech recognition
- cross-lingual transfer
- semi-supervised training
FingerprintDive into the research topics of 'Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR'. Together they form a unique fingerprint.
- 1 Active
Unmute : Opening Spoken Language Interaction to the Currently Unheard
Bell, P., Goldwater, S. & Renals, S.
1/12/20 → 30/11/23