Abstract
Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute
the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements
in perplexity and small improvements in WER. Index Terms: RNNLMs, 3-gram, prosody features, pause duration, duration of the word, syllable duration, syllable F0, GMMHMM,
DNN-HMM, Switchboard conversations and TED lectures
| Original language | English |
|---|---|
| Title of host publication | INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association |
| Pages | 2390-2394 |
| Number of pages | 5 |
| Publication status | Published - Sept 2015 |
Fingerprint
Dive into the research topics of 'Prosodically-enhanced Recurrent Neural Network Language Models'. Together they form a unique fingerprint.Projects
- 1 Finished
-
User Generated Dialogue System: uDialogue
Renals, S. (Principal Investigator) & Yamagishi, J. (Co-investigator)
1/04/15 → 31/03/16
Project: Research
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver