A Minimum V/U Error Approach to F0 Generation in HMM-Based TTS

Yao Qian, F.K. Soong, Miaomiao Wang, Zhizheng Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The HMM-based TTS can produce a highly intelligible and decent quality voice. However, HMM model degrades when feature vectors used in training are noisy. Among all noisy features, pitch tracking errors and corresponding flawed voiced/unvoiced (v/u) decisions are identified as two key factors in voice quality problems. In this paper, we propose a minimum v/u error approach to F0 generation. A prior knowledge of v/u is imposed in each Mandarin phone and accumulated v/u posterior probabilities are used to search for the optimal v/u switching point in each VU or UV segment in generation. Objectively the new approach is shown to improve v/u prediction performance, specifically on voiced to unvoiced swapping errors. They are reduced from 3.7% (baseline) down to 2.0% (new approach). The improvement is also subjectively confirmed by an AB preference test score, 72% (new approach) versus 22% (baseline).
Original languageEnglish
Title of host publicationINTERSPEECH 2009 10th Annual Conference of the International Speech Communication Association
Pages408-411
Number of pages4
Publication statusPublished - 2009

Cite this