A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks

Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulus- and system-level predictions.
Original languageEnglish
Title of host publicationInterspeech 2016
PublisherInternational Speech Communication Association
Pages342-346
Number of pages5
DOIs
Publication statusPublished - 12 Sep 2016
EventInterspeech 2016 - San Francisco, United States
Duration: 8 Sep 201612 Sep 2016
http://www.interspeech2016.org/

Publication series

Name
PublisherInternational Speech Communication Association
ISSN (Print)1990-9772

Conference

ConferenceInterspeech 2016
Country/TerritoryUnited States
CitySan Francisco
Period8/09/1612/09/16
Internet address

Fingerprint

Dive into the research topics of 'A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks'. Together they form a unique fingerprint.

Cite this