Edinburgh Research Explorer

Speaker-Independent Classification of Phonetic Segments from Raw Ultrasound in Child Speech

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions



  • Download as Adobe PDF

    Rights statement: © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

    Accepted author manuscript, 2.01 MB, PDF document

    Licence: All Rights Reserved

Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
Publication statusE-pub ahead of print - 17 Apr 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44

Publication series

ISSN (Print)1520-6149
ISSN (Electronic)2379-190X


Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
CountryUnited Kingdom
Internet address


Ultrasound tongue imaging (UTI) provides a convenient way to visualize the vocal tract during speech production. UTI is increasingly being used for speech therapy, making it important to develop automatic methods to assist various time-consuming manual tasks currently performed by speech therapists. A key challenge is to generalize the automatic processing of ultrasound tongue images to previously unseen speakers. In this work, we investigate the classification of phonetic segments (tongue shapes) from raw ultrasound recordings under several training scenarios: speaker-dependent, multi-speaker, speaker-independent, and speaker-adapted. We observe that models underperform when applied to data from speakers not seen at training time. However, when provided with minimal additional speaker information, such as the mean ultrasound frame, the models generalize better to unseen speakers.

    Research areas

  • ultrasound, ultrasound tongue imaging, speaker independent, speech therapy, Child Speech

Download statistics

No data available

ID: 80034643