Paragraph-based Prosodic Cues for Speech Synthesis Applications

Mireia Farrús, Catherine Lai, Johanna Moore

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Speech synthesis has improved in both expressiveness and voice quality in
recent years. However, obtaining full expressiveness when dealing with large
multi-sentential synthesized discourse is still a challenge, since speech
synthesizers do not take into account the prosodic differences that have been
observed in discourse units such as paragraphs. The current study validates
and extends previous work by analyzing the prosody of paragraph units in a
large and diverse corpus of TED Talks using automatically extracted F0,
intensity and timing features. In addition, a series of classification
experiments was performed in order to identify which features are consistently
used to distinguish paragraph breaks. The results show significant differences
in prosody related to paragraph position. Moreover, the classification
experiments show that boundary features such as pause duration and differences
in F0 and intensity levels are the most consistent cues in marking paragraph
boundaries. This suggests that these features should be taken into account when
generating spoken discourse in order to improve naturalness and expressiveness.
Original languageEnglish
Title of host publicationProceedings of Speech Prosody 2016
Number of pages5
Publication statusPublished - Jun 2016
EventSpeech Prosody 2016 - Boston, United States
Duration: 31 May 20163 Jun 2016


ConferenceSpeech Prosody 2016
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Paragraph-based Prosodic Cues for Speech Synthesis Applications'. Together they form a unique fingerprint.

Cite this