A technique for controlling voice quality of synthetic speech using multiple regression HSMM

Makoto Tachibana*, Takashi Nose, Junichi Yamagishi, Takao Kobayashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper describes a technique for controlling voice quality of synthetic speech using multiple regression hidden semi-Markov model (HSMM). In the technique, we assume that the mean vectors of output and state duration distribution of HSMM are modeled by multiple regression with a parameter vector called voice quality control vector. We first choose three features for controlling voice qualities, that is, "smooth voice - nonsmooth voice," "warm - cold," "high-pitched - low-pitched," and then we attempt to control voice quality of synthetic speech for these features. From the results of several subjective tests, we show that the proposed technique can change these features of voice quality intuitively.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages2438-2441
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 17 Sept 200621 Sept 2006

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume5

Conference

ConferenceINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Country/TerritoryUnited States
CityPittsburgh, PA
Period17/09/0621/09/06

Keywords / Materials (for Non-textual outputs)

  • HMM-based speech synthesis
  • HSMM
  • Multiple regression HMM
  • Voice quality control

Fingerprint

Dive into the research topics of 'A technique for controlling voice quality of synthetic speech using multiple regression HSMM'. Together they form a unique fingerprint.

Cite this