Phone Duration Modeling Using Gradient Tree Boosting

Junichi Yamagishi, Hisashi Kawai, Takao Kobayashi

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In text-to-speech synthesis systems, phone duration influences the quality and naturalness of synthetic speech. In this study, we incorporate an ensemble learning technique called gradient tree boosting into phone duration modeling as an alternative to the conventional approach using regression trees, and objectively evaluate the prediction accuracy of Japanese, Mandarin, and English phone duration. The gradient tree boosting algorithm is a meta algorithm of regression trees: it iteratively builds the regression tree from the residuals and outputs weighting sum of the regression trees. Our evaluation results show that compared to the regression trees or other techniques related to the regression trees, the gradient tree boosting algorithm can substantially and robustly improve the predictive accuracy of the phone duration regardless of languages, speakers, or domains.
Original languageEnglish
Pages (from-to)405-415
Number of pages11
JournalSpeech Communication
Issue number5
Publication statusPublished - May 2008

Keywords / Materials (for Non-textual outputs)

  • Text-to-speech synthesis
  • Phone duration modeling
  • Gradient tree boosing


Dive into the research topics of 'Phone Duration Modeling Using Gradient Tree Boosting'. Together they form a unique fingerprint.

Cite this