Combining Lightly-supervised Learning and User Feedback to Construct Andimprove a Statistical Parametric Speech Synthesizer for Malay

Lau Chee Yong, Oliver Watts, Simon King

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In this study, we aim to reduce the human effort in preparing training data for synthesizing human speech and improve the quality of synthetic speech. In spite of the learning-from-data used to train the statistical models, the construction of a statistical parametric speech synthesizer involves substantial human effort, especially when using imperfect data or working on a new language. Here, we use lightly-supervised methods for preparing the data and constructing the text-processing front end. This initial system is then iteratively improved using active learning in which feedback from users is used to disambiguate the pronunciation system in our chosen language, Malay. The data are prepared using speaker diarisation and lightly-supervised text-speech alignment. In the front end, graphemebased units are used. The active learning used small amounts of feedback from a listener to train a classifier. We report evaluations of two systems built from high-quality studio data and lower-quality `found' data respectively and show that the intelligibility of each can be improved using active learning.
Original languageEnglish
Pages (from-to)1227-1232
Number of pages6
JournalResearch Journal of Applied Sciences, Engineering and Technology
Volume11
Issue number11
DOIs
Publication statusPublished - 15 Dec 2015

Fingerprint

Dive into the research topics of 'Combining Lightly-supervised Learning and User Feedback to Construct Andimprove a Statistical Parametric Speech Synthesizer for Malay'. Together they form a unique fingerprint.

Cite this