The role of context in neural pitch accent detection in English

Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Prosody is a rich information source in natural language, serving as a marker for phenomena such as contrast. In order to make this information available to downstream tasks, we need away to detect prosodic events in speech. We propose a new model for pitch accent detection, inspired by the work of Stehwien et al. (2018), who presented a CNN-based model for this task. Our model makes greater use of context by using full utterances as input and adding an LSTM layer. We find that these innovations lead to an improvement from 87.5 percent to 88.7 percent accuracy on pitch accent detection on American English speech in the Boston University Radio News Corpus, a state-of-the-art result. We also find that a simple baseline that just predicts a pitch accent on every content word yields 82.2 percent accuracy, and we suggest that this is the appropriate baseline for this task. Finally, we conduct ablation tests that show pitch is the most important acoustic feature for this task and this corpus.
Original languageEnglish
Title of host publicationProceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
Number of pages7
Publication statusAccepted/In press - 18 Sept 2020
EventThe 2020 Conference on Empirical Methods in Natural Language Processing - Virtual conference
Duration: 16 Nov 202020 Nov 2020
https://2020.emnlp.org/

Conference

ConferenceThe 2020 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2020
CityVirtual conference
Period16/11/2020/11/20
Internet address

Fingerprint

Dive into the research topics of 'The role of context in neural pitch accent detection in English'. Together they form a unique fingerprint.

Cite this