Conditional restricted boltzmann machine for voice conversion

Zhizheng Wu, E. S. Chng, Haizhou Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The conventional statistical-based transformation functions for voice conversion have been shown to suffer over-smoothing and over-fitting problems. The over-smoothing problem arises because of the statistical average during estimating the model parameters for the transformation function. In addition, the large number of parameters in the statistical model cannot be well estimated from the limited parallel training data, which will result in the over-fitting problem. In this work, we investigate a robust transformation function for voice conversion using conditional restricted Boltzmann machine. Conditional restricted Boltzmann machine, which performs linear and non-linear transformations simultaneously, is proposed to learn the relationship between source and target speech. CMU ARCTIC corpus is adopted in the experimental validations. The number of parallel training utterances is varied from 2 to 40. For these different training situations, two objective evaluation measures, mel-cepstral distortion and correlation coefficient, both show that the proposed method outperforms the main stream joint density Gaussian mixture model method consistently.
Original languageEnglish
Title of host publicationSignal and Information Processing (ChinaSIP), 2013 IEEE China Summit & International Conference on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages104-108
Number of pages4
ISBN (Print)978-1-4799-1043-4
DOIs
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Conditional restricted boltzmann machine for voice conversion'. Together they form a unique fingerprint.

Cite this