Edinburgh Research Explorer

Prosodically-enhanced Recurrent Neural Network Language Models

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • Siva Reddy Gangireddy
  • Steve Renals
  • Yoshihiko Nankaku
  • Akinobu Lee

Related Edinburgh Organisations

Open Access permissions



Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Number of pages5
Publication statusPublished - Sep 2015


Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER. Index Terms: RNNLMs, 3-gram, prosody features, pause duration, duration of the word, syllable duration, syllable F0, GMMHMM, DNN-HMM, Switchboard conversations and TED lectures

Download statistics

No data available

ID: 19957517