Abstract / Description of output
Pitch tracking, or the continuous extraction of fundamental fre- quency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an exten- sive investigation into speech pitch tracking. This work devel- oped a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamen- tal frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional track- ers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.
Original language | English |
---|---|
Title of host publication | Proc. Interspeech 2019 |
Publisher | International Speech Communication Association |
Pages | 1995-1999 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 19 Sept 2019 |
Event | Interspeech 2019 - Graz, Austria Duration: 15 Sept 2019 → 19 Sept 2019 https://www.interspeech2019.org/ |
Publication series
Name | |
---|---|
Publisher | International Speech Communication Association |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | Interspeech 2019 |
---|---|
Country/Territory | Austria |
City | Graz |
Period | 15/09/19 → 19/09/19 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- undamental frequency
- pitch tracking
- neuralnetwork