Abstract
The exploration of uncanny valley effects (UVE) - a distaste for entities that appear almost, but not quite, human - has been a productive topic of research in human-robot interaction. Meanwhile, realistic text-to-speech (TTS) voices are increasingly encountered in various settings. In this work, we aim to describe the relationship between synthesised voices' perceived human-likeness and pleasantness and seek evidence of auditory UVE in listeners’ evaluations. In an online between-subjects experiment, listeners rated an array of manipulated TTS voices, trained using a single speaker’s data. The evidence obtained is compatible with a slight plateau in a generally positive correlation between realism and approval. All the TTS voices used received ratings of below 50% on average for ‘human-likeness’, and therefore conclusions about UVE, i.e. negative reactions to voices perceived as very human-like, cannot be drawn from these data. Our results suggest that, although a correlation exists, high realism may not be necessary for relatively high approval; on average, voices with decreased pitch variation were rated about twice as highly for being ‘pleasant’ and ‘friendly’ as they were ‘like a human’. The relationship between pitch variation and perceived realism is examined and identified as a direction for further research.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of Speech Prosody 2024 |
| Editors | Yiya Chen, Aoju Chen, Amalia Arvaniti |
| Publisher | International Speech Communication Association (ISCA) |
| Pages | 1115-1119 |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | Speech Prosody 2024 - Netherlands, Leiden Duration: 2 Jul 2024 → 5 Jul 2024 https://www.universiteitleiden.nl/sp2024 |
Publication series
| Name | Speech Prosody |
|---|---|
| Publisher | International Speech Communication Association (ISCA) |
| ISSN (Electronic) | 2333-2042 |
Conference
| Conference | Speech Prosody 2024 |
|---|---|
| City | Leiden |
| Period | 2/07/24 → 5/07/24 |
| Internet address |
Keywords / Materials (for Non-textual outputs)
- speech synthesis
- speech prosody
- pitch variation
- human-computer interaction
- TTS evaluation