Prediction of creaky voice from contextual factors

T. Drugman, J. Kane, T. Raitio, C. Gobl

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. In order to enhance the naturalness of speech synthesisers, these latter should be able to generate speech in all its expressive diversity. This includes a proper use of creaky voice. The goal of this paper is two-fold. Firstly we analyse how contextual factors can be informative for the prediction of creaky use. It is observed that a few contextual factors related to speech production preceding a silence or a pause are of particular interest. This study validates that creaky voice plays a crucial syntactic role, allowing for a better structuring of phrases. In a second experiment, we investigate the prediction of creakiness from contextual factors based on HMMs. Four methods are compared on a US English and a Finnish speaker. It is shown that the best prediction technique achieves a promising performance comparable to what is carried out with the creaky detection algorithm on which HMMs were trained.
Original languageEnglish
Title of host publicationAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Print)978-1-4799-0356-6
Publication statusPublished - 1 May 2013


  • hidden Markov models
  • natural language processing
  • prediction theory
  • signal detection
  • speech synthesis
  • Finnish speaker
  • HMM
  • US English speaker
  • contextual factors
  • conversational speech
  • creakiness prediction
  • creaky detection algorithm
  • creaky voice prediction
  • read speech
  • speech production
  • speech synthesisers naturalness
  • vocal fry
  • voice quality
  • Acoustics
  • Educational institutions
  • Hidden Markov models
  • Measurement
  • Speech
  • Speech synthesis
  • Training
  • Contextual Factors
  • Creaky voice
  • Expressive Speech
  • Speech Synthesis

Fingerprint Dive into the research topics of 'Prediction of creaky voice from contextual factors'. Together they form a unique fingerprint.

Cite this