TY - GEN
T1 - Shape matters
T2 - 11th International Conference on Speech Prosody, Speech Prosody 2022
AU - Cole, Jennifer
AU - Steffman, Jeremy
AU - Tilsen, Sam
N1 - /
PY - 2022/5/26
Y1 - 2022/5/26
N2 - In Autosegmental-Metrical models of intonational phonology, pitch accents, phrase accents and boundary tones may combine freely to create a predicted set of phonologically distinct phrase-final “nuclear” tunes. In this study we ask if an 8-way distinction in nuclear tune shape in American English, predicted from combinations of 2 (monotonal) pitch accents, 2 phrase accents and 2 boundary tones, is manifest in speech production and in speech perception. F0 trajectories from an imitative speech production experiment were analyzed using (i) neural net classification, and (ii) human listeners’ perceptual discrimination of the model utterances. Pairwise classification accuracy of the imitative productions is highest for tune pairs that differ in holistic shape (high-rising vs. rise-fall), and poorest for tunes with the same shape that differ in (higher vs. lower) final f0. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0. Together the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape, which only partly aligns with distinctions in tonal specification, and a weak/poorly differentiated distinction between tunes with the same holistic shape but small differences in final f0.
AB - In Autosegmental-Metrical models of intonational phonology, pitch accents, phrase accents and boundary tones may combine freely to create a predicted set of phonologically distinct phrase-final “nuclear” tunes. In this study we ask if an 8-way distinction in nuclear tune shape in American English, predicted from combinations of 2 (monotonal) pitch accents, 2 phrase accents and 2 boundary tones, is manifest in speech production and in speech perception. F0 trajectories from an imitative speech production experiment were analyzed using (i) neural net classification, and (ii) human listeners’ perceptual discrimination of the model utterances. Pairwise classification accuracy of the imitative productions is highest for tune pairs that differ in holistic shape (high-rising vs. rise-fall), and poorest for tunes with the same shape that differ in (higher vs. lower) final f0. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0. Together the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape, which only partly aligns with distinctions in tonal specification, and a weak/poorly differentiated distinction between tunes with the same holistic shape but small differences in final f0.
KW - intonation production
KW - intonation perception
KW - nuclear tunes
KW - neural net classification
KW - deep learning
U2 - 10.21437/SpeechProsody.2022-61
DO - 10.21437/SpeechProsody.2022-61
M3 - Conference contribution
T3 - Proceedings of the International Conference on Speech Prosody
SP - 297
EP - 301
BT - Proceedings of the International Conference on Speech Prosody 2022
PB - ISCA
Y2 - 23 May 2022 through 26 May 2022
ER -