Projects per year
Abstract
This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These so called image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a -3.26% over segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.
Original language | English |
---|---|
Title of host publication | 2016 IEEE Workshop on Spoken Language Technology |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 597-602 |
Number of pages | 6 |
ISBN (Print) | 978-1-5090-4903-5 |
DOIs | |
Publication status | Published - 9 Feb 2017 |
Event | 2016 IEEE Workshop on Spoken Language Technology - San Diego, United States Duration: 13 Dec 2016 → 16 Dec 2016 https://www2.securecms.com/SLT2016//Default.asp |
Conference
Conference | 2016 IEEE Workshop on Spoken Language Technology |
---|---|
Abbreviated title | SLT 2016 |
Country/Territory | United States |
City | San Diego |
Period | 13/12/16 → 16/12/16 |
Internet address |
Fingerprint
Dive into the research topics of 'Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients'. Together they form a unique fingerprint.Projects
- 1 Finished