The Temporal Delay Hypothesis: Natural, Vocoded and Synthetic Speech

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Including disfluencies in synthetic speech is being explored as a way of making synthetic speech sound more natural and conversational. How to measure whether the resulting speech is actually more natural, however, is not straightforward. Conventional approaches to synthetic speech evaluation fall short as a listener is either primed to prefer stimuli with filled pauses or, when they aren't primed they prefer more fluent speech. Psycholinguistic reaction time experiments may circumvent this issue. In this paper, we revisit one such reaction time experiment. For natural speech, delays in word onset were found to facilitate word recognition regardless of the type of delay; be they a filled pause (um), silence or a tone. We expand these experiments by examining the effect of using vocoded and synthetic speech. Our results partially replicate previous findings. For natural and vocoded speech, if the delay is a silent pause, significant increases in the speed of word recognition are found. If the delay comprises a filled pause there is a significant increase in reaction time for vocoded speech but not for natural speech. For synthetic speech, no clear effects of delay on word recognition are found. We hypothesise this is because it takes longer (requires more cognitive resources) to process synthetic speech than natural or vocoded speech.
Original languageEnglish
Title of host publicationProceedings of DiSS, The 7th Workshop on Disfluencies in Spontaneous Speech
Number of pages5
Publication statusPublished - 8 Aug 2015

Keywords / Materials (for Non-textual outputs)

  • delay hypothesis, disfluency

Fingerprint

Dive into the research topics of 'The Temporal Delay Hypothesis: Natural, Vocoded and Synthetic Speech'. Together they form a unique fingerprint.

Cite this