A fixed dimension and perceptually based dynamic sinusoidal model of speech

Qiong Hu, Y. Stylianou, Korin Richmond, Ranniery Maia, Junichi Yamagishi, Javier Latorre

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper presents a fixed- and low-dimensional, perceptually based dynamic sinusoidal model of speech referred to as PDM (Perceptual Dynamic Model). To decrease and fix the number of sinusoidal components typically used in the standard sinusoidal model, we propose to use only one dynamic sinusoidal component per critical band. For each band, the sinusoid with the maximum spectral amplitude is selected and associated with the centre frequency of that critical band. The model is expanded at low frequencies by incorporating sinusoids at the boundaries of the corresponding bands while at the higher frequencies a modulated noise component is used. A listening test is conducted to compare speech reconstructed with PDM and state-of-the-art models of speech, where all models are constrained to use an equal number of parameters. The results show that PDM is clearly preferred in terms of quality over the other systems.
Original languageEnglish
Title of host publicationAcoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
PublisherInstitute of Electrical and Electronics Engineers
Pages6311-6315
DOIs
Publication statusPublished - May 2014

Keywords / Materials (for Non-textual outputs)

  • Sinusoidal Model
  • Critical band
  • Vocoder

Fingerprint

Dive into the research topics of 'A fixed dimension and perceptually based dynamic sinusoidal model of speech'. Together they form a unique fingerprint.

Cite this