TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present the Tongue and Lips corpus (TaL), a multi-speaker cor-pus of audio, ultrasound tongue imaging, and lip videos. TaL consists of two parts: TaL1 is a set of six recording sessions of one professional voice talent, a male native speaker of English; TaL80is a set of recording sessions of 81 native speakers of English with-out voice talent experience. Overall, the corpus contains 24 hours ofparallel ultrasound, video, and audio data, of which approximately 13.5 hours are speech. This paper describes the corpus and presents benchmark results for the tasks of speech recognition, speech synthesis (articulatory-to-acoustic mapping), and automatic synchronisation of ultrasound to audio. The TaL corpus is publicly availableunder the CC BY-NC 4.0 license.
Original languageEnglish
Title of host publicationIEEE Spoken Language Technology Workshop (SLT 2021)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
Publication statusAccepted/In press - 3 Nov 2020
EventIEEE Spoken Language Technology Workshop -
Duration: 19 Jan 202122 Jan 2021

Conference

ConferenceIEEE Spoken Language Technology Workshop
Abbreviated titleSLT 2021
Period19/01/2122/01/21

Keywords

  • ultrasound tongue imaging
  • video lip imaging
  • silent speech interface (SSI)
  • Articulography
  • corpora
  • silent speech

Fingerprint Dive into the research topics of 'TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos'. Together they form a unique fingerprint.

Cite this