Edinburgh Research Explorer

A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Number of pages5
Publication statusPublished - Sep 2015


Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs)
to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary end-toend speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequence of feature representations,
from which a decoder recovers a sequence of words. We investigated this approach on the Switchboard corpus using a training set of around 300 hours of transcribed audio data. Without the use of an explicit language model or pronunciation lexicon, we achieved promising recognition accuracy, demonstrating that this approach warrants further investigation.
Index Terms: end-to-end speech recognition, deep neural networks,
recurrent neural networks, encoder-decoder.

Download statistics

No data available

ID: 19909909