Edinburgh Research Explorer

Soft context clustering for F0 modeling in HMM-based speech synthesis

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: © Khorram, S., Sameti, H., & King, S. (2015). Soft context clustering for F0 modeling in HMM-based speech synthesis. EURASIP Journal on Advances in Signal Processing, 2015(1). 10.1186/1687-6180-2015-2

    Final published version, 2 MB, PDF-document

    Licence: Creative Commons: Attribution (CC-BY)

http://asp.eurasipjournals.com/content/2015/1/2#refs
Original languageEnglish
JournalEURASIP Journal on Advances in Signal Processing
Volume2015
Issue number1
DOIs
Publication statusPublished - 9 Jan 2015

Abstract

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this ‘divide-and-conquer’ approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.

    Research areas

  • Context clustering, Decision tree-based clustering, F0 modeling, Hidden Markov model, HMM, HMM-based speech synthesis, Maximum entropy model, Soft context clustering, Soft decision tree, Statistical parametric speech synthesis

Download statistics

No data available

ID: 19840956