Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis

Catherine Mayo, Robert A.j. Clark, Simon King

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle sub- and supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by default when asked to evaluate synthetic speech. It may, therefore, be potentially quite difficult to design an evaluation method that allows listeners to concentrate on only one dimension of the signal, while ignoring others that are perceptually more important to them.

The aim of the current study was to determine which acoustic characteristics of unit-selection synthetic speech are most salient to listeners when evaluating the naturalness of such speech. This study made use of multidimensional scaling techniques to analyse listeners’ pairwise comparisons of synthetic speech sentences. Results indicate that listeners place a great deal of perceptual importance on the presence of artifacts and discontinuities in the speech, somewhat less importance on aspects of segmental quality, and very little importance on stress/intonation appropriateness. These relative differences in importance will impact on listeners’ ability to attend to these different acoustic characteristics of synthetic speech, and should therefore be taken into account when designing appropriate methods of synthetic speech evaluation.
Original languageEnglish
Pages (from-to)311-326
Number of pages15
JournalSpeech Communication
Volume53
Issue number3
DOIs
Publication statusPublished - 1 Mar 2011

Keywords / Materials (for Non-textual outputs)

  • Speech synthesis
  • Evaluation
  • Speech perception

Fingerprint

Dive into the research topics of 'Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis'. Together they form a unique fingerprint.

Cite this