Projects per year
Abstract
This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame). Unlike conventional approaches to duration modelling - which assume that duration distributions have a particular form (e.g., a Gaussian) and use the mean of that distribution for synthesis - our approach can in principle model any distribution supported on the non-negative integers. Generation from this model can be performed in many ways; here we consider output generation based on the median predicted duration. The median is more typical (more probable) than the conventional mean duration, is robust to training-data irregularities, and enables incremental generation. Furthermore, a frame-level approach to duration prediction is consistent with a longer-term goal of modelling durations and acoustic features together. Results indicate that the proposed method is competitive with baseline approaches in approximating the median duration of held-out natural speech.
| Original language | English |
|---|---|
| Title of host publication | 2016 IEEE Spoken Language Technology Workshop (SLT) |
| Publisher | Institute of Electrical and Electronics Engineers |
| Pages | 686-692 |
| Number of pages | 7 |
| ISBN (Electronic) | 978-1-5090-4903-5 |
| ISBN (Print) | 978-1-5090-4904-2 |
| DOIs | |
| Publication status | Published - 9 Feb 2017 |
| Event | 2016 IEEE Spoken Language Technology Workshop - San Diego, United States Duration: 13 Dec 2016 → 16 Dec 2016 https://www2.securecms.com/SLT2016//Default.asp |
Conference
| Conference | 2016 IEEE Spoken Language Technology Workshop |
|---|---|
| Abbreviated title | IEEE SLT 2016 |
| Country/Territory | United States |
| City | San Diego |
| Period | 13/12/16 → 16/12/16 |
| Internet address |
Fingerprint
Dive into the research topics of 'Median-based generation of synthetic speech durations using a non-parametric approach'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Natural Speech Technology
Renals, S. (Principal Investigator) & King, S. (Co-investigator)
1/05/11 → 31/07/16
Project: Research