Edinburgh Research Explorer

Punctuated Transcription of Multi-genre Broadcasts Using Acoustic and Lexical Approaches

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

http://ieeexplore.ieee.org/document/7846300/
Original languageEnglish
Title of host publication2016 IEEE Workshop on Spoken Language Technology
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages433-440
Number of pages8
ISBN (Electronic)978-1-5090-4903-5
DOIs
Publication statusPublished - 9 Feb 2017
Event2016 IEEE Workshop on Spoken Language Technology - San Diego, United States
Duration: 13 Dec 201616 Dec 2016
https://www2.securecms.com/SLT2016//Default.asp

Conference

Conference2016 IEEE Workshop on Spoken Language Technology
Abbreviated titleSLT 2016
CountryUnited States
CitySan Diego
Period13/12/1616/12/16
Internet address

Abstract

In this paper we investigate the punctuated transcription of multi-genre broadcast media. We examine four systems, three of which are based on lexical features, the fourth of which uses acoustic features by integrating punctuation into the speech recognition acoustic models. We also explore the combination of these component systems using voting and log-linear interpolation. We performed experiments on the English language MGB Challenge data, which comprises about 1,600h of BBC television recordings. Our results indicate that a lexical system, based on a neural machine translation approach is significantly better than other systems achieving an F-Measure of 62.6% on reference text, with a relative degradation of 19% on ASR output. Our analysis of the results in terms of specific punctuation indicated that using longer context improves the prediction of question marks and acoustic information improves prediction of exclamation marks. Finally, we show that even though the systems are complementary, their straightforward combination does not yield better F-measure

Event

2016 IEEE Workshop on Spoken Language Technology

13/12/1616/12/16

San Diego, United States

Event: Conference

Download statistics

No data available

ID: 28249113