Abstract
Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. In order to enhance the naturalness of speech synthesisers, these latter should be able to generate speech in all its expressive diversity. This includes a proper use of creaky voice. The goal of this paper is two-fold. Firstly we analyse how contextual factors can be informative for the prediction of creaky use. It is observed that a few contextual factors related to speech production preceding a silence or a pause are of particular interest. This study validates that creaky voice plays a crucial syntactic role, allowing for a better structuring of phrases. In a second experiment, we investigate the prediction of creakiness from contextual factors based on HMMs. Four methods are compared on a US English and a Finnish speaker. It is shown that the best prediction technique achieves a promising performance comparable to what is carried out with the creaky detection algorithm on which HMMs were trained.
| Original language | English |
|---|---|
| Title of host publication | Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on |
| Publisher | Institute of Electrical and Electronics Engineers |
| Pages | 7967-7971 |
| Number of pages | 5 |
| ISBN (Print) | 978-1-4799-0356-6 |
| DOIs | |
| Publication status | Published - 1 May 2013 |
Keywords / Materials (for Non-textual outputs)
- hidden Markov models
- natural language processing
- prediction theory
- signal detection
- speech synthesis
- Finnish speaker
- HMM
- US English speaker
- contextual factors
- conversational speech
- creakiness prediction
- creaky detection algorithm
- creaky voice prediction
- read speech
- speech production
- speech synthesisers naturalness
- vocal fry
- voice quality
- Acoustics
- Educational institutions
- Hidden Markov models
- Measurement
- Speech
- Speech synthesis
- Training
- Contextual Factors
- Creaky voice
- Expressive Speech
- Speech Synthesis
Fingerprint
Dive into the research topics of 'Prediction of creaky voice from contextual factors'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Simple4All: Speech synthesis that improves through adaptive learning
King, S. (Principal Investigator) & Renals, S. (Co-investigator)
1/11/11 → 31/10/14
Project: Research
File
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver