Edinburgh Research Explorer

Sentence Boundary Detection in Broadcast Speech Transcripts

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the ISCA ITRW on Automatics Speech Recognition (ASR2000)
Subtitle of host publicationChallenges for the new Millenium
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages228-235
Publication statusPublished - 2000
EventASR2000 - Automatic Speech Recognition: Challenges for the new Millenium - Paris, France
Duration: 18 Sep 200020 Sep 2000

Workshop

WorkshopASR2000 - Automatic Speech Recognition: Challenges for the new Millenium
CountryFrance
CityParis
Period18/09/0020/09/00

Abstract

This paper presents an approach to identifying sentence boundaries in broadcast speech transcripts. We describe finite state models that extract sentence boundary information statistically from text and audio sources. An n-gram language model is constructed from a collection of British English news broadcasts and scripts. An alternative model is estimated from pause duration information in speech recogniser outputs aligned with their programme script counterparts. Experimental results show that the pause duration model alone outperforms the language modelling approach and that, by combining these two models, it can be improved further and precision and recall scores of over 70% were attained for the task.

Event

ASR2000 - Automatic Speech Recognition: Challenges for the new Millenium

18/09/0020/09/00

Paris, France

Event: Workshop

ID: 27450638