A Lattice-based Approach to Automatic Filled Pause Insertion

Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, Simon King

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper describes a novel method for automatically inserting filled pauses (e.g., UM) into fluent texts. Although filled pauses are known to serve a wide range of psychological and structural functions in conversational speech, they have not traditionally been modelled overtly by state-of-the-art speech synthesis systems. However, several recent systems have started to model disfluencies specifically, and so there is an increasing need to create disfluent speech synthesis input by automatically inserting filled pauses into otherwise fluent text. The approach presented here interpolates Ngrams and Full-Output Recurrent Neural Network Language Models (f-RNNLMs) in a lattice-rescoring framework. It is shown that the interpolated system outperforms separate Ngram and f-RNNLM systems, where performance is analysed using the Precision, Recall, and F-score metrics.
Original languageEnglish
Title of host publicationProc. of DiSS 2015, The 7th Workshop on Disfluencies in Spontaneous Speech
Place of PublicationEdinburgh
Number of pages4
Publication statusPublished - 10 Aug 2015

Keywords

  • Disfluency, Filled Pauses, f-RNNLMs, Ngrams, Lattices

Fingerprint

Dive into the research topics of 'A Lattice-based Approach to Automatic Filled Pause Insertion'. Together they form a unique fingerprint.

Cite this