Prosodically-enhanced Recurrent Neural Network Language Models

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER. Index Terms: RNNLMs, 3-gram, prosody features, pause duration, duration of the word, syllable duration, syllable F0, GMMHMM, DNN-HMM, Switchboard conversations and TED lectures
Original languageEnglish
Title of host publicationINTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association
Number of pages5
Publication statusPublished - Sep 2015

Fingerprint Dive into the research topics of 'Prosodically-enhanced Recurrent Neural Network Language Models'. Together they form a unique fingerprint.

Cite this