Direct F0 Estimation with Neural-Network-based Regression

Shuzhuang Xu, Hiroshi Shimodaira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Pitch tracking, or the continuous extraction of fundamental fre- quency from speech waveforms, is of vital importance to many applications in speech analysis and synthesis. Many existing trackers, including conventional ones such as Praat, RAPT and YIN, and newly proposed neural-network-based ones such as DNN-CLS, CREPE and RNN-REG, have conducted an exten- sive investigation into speech pitch tracking. This work devel- oped a different end-to-end regression model based on neural networks, where a voice detector and a newly proposed value estimator work jointly to highlight the trajectory of fundamen- tal frequency. Experiments on the PTDB-TUG corpus showed that the system surpasses canonical neural networks in terms of gross error rate. It further outperformed conventional track- ers under clean condition and neural-network classifiers under noisy condition by the NOISEX-92 corpus.
Original languageEnglish
Title of host publicationProc. Interspeech 2019
PublisherInternational Speech Communication Association
Pages1995-1999
Number of pages5
DOIs
Publication statusPublished - 19 Sept 2019
EventInterspeech 2019 - Graz, Austria
Duration: 15 Sept 201919 Sept 2019
https://www.interspeech2019.org/

Publication series

Name
PublisherInternational Speech Communication Association
ISSN (Electronic)1990-9772

Conference

ConferenceInterspeech 2019
Country/TerritoryAustria
CityGraz
Period15/09/1919/09/19
Internet address

Keywords / Materials (for Non-textual outputs)

  • undamental frequency
  • pitch tracking
  • neuralnetwork

Fingerprint

Dive into the research topics of 'Direct F0 Estimation with Neural-Network-based Regression'. Together they form a unique fingerprint.

Cite this