Projects per year
This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.
|Title of host publication||Proc. of DiSS 2015, The 7th Workshop on Disfluencies in Spontaneous Speech|
|Place of Publication||Edinburgh|
|Number of pages||4|
|Publication status||Published - 10 Aug 2015|
- Disfluency, Filled Pauses, f-RNNLMs, Ngrams, Lattices