Abstract
We present a system to enable efficient, collaborative human correction of ASR transcripts, designed to operate in real-time situations, for example, when post-editing live captions generated for news broadcasts. In the system, confusion networks derived from ASR lattices are used to highlight low-confident words and present alternatives to the user for quick correction. The system uses a client-server architecture, whereby information about each manual edit is posted to the server. Such information can be used to dynamically update the one-best ASR output for all utterances currently in the editing pipeline. We propose to make updates in three different ways; by finding a new one-best path through an existing ASR lattice consistent with the correction received; by identifying further instances of out-of-vocabulary terms entered by the user; and by adapting the language model on the fly. Updates are received asynchronously by the client.
Original language | English |
---|---|
Title of host publication | Interspeech 2017 |
Publisher | International Speech Communication Association |
Number of pages | 2 |
DOIs | |
Publication status | Published - 24 Aug 2017 |
Event | Interspeech 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 http://www.interspeech2017.org/ |
Publication series
Name | Interspeech |
---|---|
Publisher | International Speech Communication Association |
ISSN (Print) | 1990-9772 |
Conference
Conference | Interspeech 2017 |
---|---|
Country/Territory | Sweden |
City | Stockholm |
Period | 20/08/17 → 24/08/17 |
Internet address |