Direct F0 Estimation with Neural-Network-based Regression

Shuzhuang Xu, Hiroshi Shimodaira

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Pitch tracking, or the continuous extraction of fundamental fre- quency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an exten- sive investigation into speech pitch tracking. This work devel- oped a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamen- tal frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional track- ers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.
Original languageEnglish
Title of host publicationProc. Interspeech 2019
PublisherInternational Speech Communication Association
Number of pages5
Publication statusPublished - 19 Sep 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sep 201919 Sep 2019

Publication series

PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772


ConferenceInterspeech 2019
Internet address


  • undamental frequency
  • pitch tracking
  • neuralnetwork


Dive into the research topics of 'Direct F0 Estimation with Neural-Network-based Regression'. Together they form a unique fingerprint.

Cite this