Windowed Attention Mechanisms for Speech Recognition

Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The usual attention mechanisms used for encoder-decoder models do not constrain the relationship between input and output sequences to be monotonic. To address this we explore windowed attention mechanisms which restrict attention to a block of source hidden states. Rule-based windowing restricts attention to a (typically large) fixed-length window. The performance of such methods is poor if the window size is small. In this paper, we propose a fully-trainable windowed attention and provide a detailed analysis on the factors which affect the performance of such an attention mechanism. Compared to the rule-based window methods, the learned window size is significantly smaller yet the model's performance is competitive. On the TIMIT corpus this approach has resulted in a 17% (relative) performance improvement over the traditional attention model. Our model also yields comparable accuracies to the joint CTC-attention model on the Wall Street Journal corpus.
Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationBrighton, United Kingdom
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages7100-7104
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
DOIs
Publication statusE-pub ahead of print - 17 Apr 2019
Event44th International Conference on Acoustics, Speech, and Signal Processing: Signal Processing: Empowering Science and Technology for Humankind - Brighton , United Kingdom
Duration: 12 May 201917 May 2019
Conference number: 44
https://2019.ieeeicassp.org/

Publication series

Name
PublisherIEEE
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

Conference44th International Conference on Acoustics, Speech, and Signal Processing
Abbreviated titleICASSP 2019
Country/TerritoryUnited Kingdom
CityBrighton
Period12/05/1917/05/19
Internet address

Keywords

  • End-to-end
  • Speech recognition
  • Attention

Fingerprint

Dive into the research topics of 'Windowed Attention Mechanisms for Speech Recognition'. Together they form a unique fingerprint.

Cite this