HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods

R. Karhila, U. Remes, M. Kurimo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective measure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.
Original languageEnglish
Title of host publicationAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages6930-6934
Number of pages5
DOIs
Publication statusPublished - 1 May 2013

Keywords

  • hidden Markov models
  • signal denoising
  • speaker recognition
  • speech synthesis
  • HMM-based speech synthesis adaptation
  • ITU-T recommendation 835
  • TTS synthesis
  • analysis methods
  • evaluation methods
  • frequency-weighted segmental SNR
  • listening test
  • natural speech quality
  • noise removal
  • noisy data
  • perceptually motivated objective measure
  • speaker-adaptation
  • synthetic voice
  • text-to-speech synthesis
  • Hidden Markov models
  • Noise measurement
  • Signal to noise ratio
  • Speech
  • Speech enhancement
  • Speech synthesis
  • Adaptation
  • Evaluation
  • Feature extraction
  • Noise robustness
  • Speech Synthesis

Fingerprint

Dive into the research topics of 'HMM-based speech synthesis adaptation using noisy data: Analysis and evaluation methods'. Together they form a unique fingerprint.

Cite this