Abstract
Speech synthesis has improved in both expressiveness and voice quality in
recent years. However, obtaining full expressiveness when dealing with large
multi-sentential synthesized discourse is still a challenge, since speech
synthesizers do not take into account the prosodic differences that have been
observed in discourse units such as paragraphs. The current study validates
and extends previous work by analyzing the prosody of paragraph units in a
large and diverse corpus of TED Talks using automatically extracted F0,
intensity and timing features. In addition, a series of classification
experiments was performed in order to identify which features are consistently
used to distinguish paragraph breaks. The results show significant differences
in prosody related to paragraph position. Moreover, the classification
experiments show that boundary features such as pause duration and differences
in F0 and intensity levels are the most consistent cues in marking paragraph
boundaries. This suggests that these features should be taken into account when
generating spoken discourse in order to improve naturalness and expressiveness.
recent years. However, obtaining full expressiveness when dealing with large
multi-sentential synthesized discourse is still a challenge, since speech
synthesizers do not take into account the prosodic differences that have been
observed in discourse units such as paragraphs. The current study validates
and extends previous work by analyzing the prosody of paragraph units in a
large and diverse corpus of TED Talks using automatically extracted F0,
intensity and timing features. In addition, a series of classification
experiments was performed in order to identify which features are consistently
used to distinguish paragraph breaks. The results show significant differences
in prosody related to paragraph position. Moreover, the classification
experiments show that boundary features such as pause duration and differences
in F0 and intensity levels are the most consistent cues in marking paragraph
boundaries. This suggests that these features should be taken into account when
generating spoken discourse in order to improve naturalness and expressiveness.
Original language | English |
---|---|
Title of host publication | Proceedings of Speech Prosody 2016 |
Number of pages | 5 |
Publication status | Published - Jun 2016 |
Event | Speech Prosody 2016 - Boston, United States Duration: 31 May 2016 → 3 Jun 2016 http://sites.bu.edu/speechprosody2016/ |
Conference
Conference | Speech Prosody 2016 |
---|---|
Country/Territory | United States |
City | Boston |
Period | 31/05/16 → 3/06/16 |
Internet address |