A Semi-Markov Model for Speech Segmentation with an Utterance-Break Prior

Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Speech segmentation is the problem of finding the end points of a speech utterance for passing to an automatic speech recognition (ASR) system. The quality of this segmentation can have a large impact on the accuracy of the ASR system; in this paper we demonstrate that it can have an even larger impact on downstream natural language processing tasks — in this case, machine translation. We develop a novel semi-Markov model which allows the segmentation of audio streams into speech utterances which are optimised for the desired distribution of sentence lengths for the target domain. We compare this with existing state-of-the-art methods and show that it is able to achieve not only improved ASR performance, but also to yield significant benefits to a speech translation task.
Original languageEnglish
Title of host publicationINTERSPEECH 2014 15th Annual Conference of the International Speech Communication Association
PublisherInternational Speech Communication Association
Pages2351-2355
Number of pages5
Publication statusPublished - 2014

Fingerprint

Dive into the research topics of 'A Semi-Markov Model for Speech Segmentation with an Utterance-Break Prior'. Together they form a unique fingerprint.

Cite this