Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients

Adriana Stan, Cassia Valentini Botinhao, Bogdan Orza, Mircea Giurgiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These so called image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a -3.26% over segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.
Original languageEnglish
Title of host publication2016 IEEE Workshop on Spoken Language Technology
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages597-602
Number of pages6
ISBN (Print)978-1-5090-4903-5
DOIs
Publication statusPublished - 9 Feb 2017
Event2016 IEEE Workshop on Spoken Language Technology - San Diego, United States
Duration: 13 Dec 201616 Dec 2016
https://www2.securecms.com/SLT2016//Default.asp

Conference

Conference2016 IEEE Workshop on Spoken Language Technology
Abbreviated titleSLT 2016
Country/TerritoryUnited States
CitySan Diego
Period13/12/1616/12/16
Internet address

Fingerprint

Dive into the research topics of 'Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients'. Together they form a unique fingerprint.

Cite this