Proper Name Splicing in Computer Games with TTS

Blaise Potard, Matthew P. Aylett, Christopher J. Pidcock

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Building high quality synthesis systems with open domain vocabulary and a small audio database is a challenging problem, even when the targeted application is well constrained. Monophone unit concatenation (as opposed to diphone) is an approach that can compensate for the poor unit coverage that a small database implies. However, joining at phone boundaries is a delicate task that requires accurate targeting. In this paper, we present an automatically trained targeting system based on the parametric synthesiser HTS, and compare it to a concatenative monophone system and a baseline concatenative diphone system. We apply a novel evaluation methodology which includes a qualitative component, and allows for fast incremental development of synthesis systems. Preliminary results show that although the hybrid system performed significantly more poorly on out of database items, it is less affected by segmentation errors than the monophone system.
Original languageEnglish
Title of host publicationINTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, September 9-13, 2012
Pages2222-2225
Number of pages4
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Proper Name Splicing in Computer Games with TTS'. Together they form a unique fingerprint.

Cite this