Edinburgh Research Explorer

Voice source modelling using deep neural networks for statistical parametric speech synthesis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

  • Download as Adobe PDF

    Rights statement: © Raitio, T., Lu, H., Kane, J., Suni, A., Vainio, M., King, S., & Alku, P. (2014). Voice source modelling using deep neural networks for statistical parametric speech synthesis. In European Signal Processing Conference. (pp. 2290-2294). [6952838] European Signal Processing Conference, EUSIPCO.

    Accepted author manuscript, 181 KB, PDF document

http://www.eurasip.org/Proceedings/Eusipco/Eusipco2014/EUSIPCO2014.html
Original languageEnglish
Title of host publicationEuropean Signal Processing Conference
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages2290-2294
Number of pages5
ISBN (Print)9780992862619
Publication statusPublished - 1 Sep 2014
Event22nd European Signal Processing Conference, EUSIPCO 2014 - Lisbon, United Kingdom
Duration: 1 Sep 20145 Sep 2014

Conference

Conference22nd European Signal Processing Conference, EUSIPCO 2014
CountryUnited Kingdom
CityLisbon
Period1/09/145/09/14

Abstract

This paper presents a voice source modelling method employing a deep neural network (DNN) to map from acoustic features to the time-domain glottal flow waveform. First, acoustic features and the glottal flow signal are estimated from each frame of the speech database. Pitch-synchronous glottal flow time-domain waveforms are extracted, interpolated to a constant duration, and stored in a codebook. Then, a DNN is trained to map from acoustic features to these duration-normalised glottal waveforms. At synthesis time, acoustic features are generated froma statistical parametricmodel, and from these, the trained DNN predicts the glottal flow waveform. Illustrations are provided to demonstrate that the proposed method successfully synthesises the glottal flow waveform and enables easy modification of the waveform by adjusting the input values to the DNN. In a subjective listening test, the proposed method was rated as equal to a high-quality method employing a stored glottal flow waveform.

    Research areas

  • Deep neural network, DNN, glottal flow, statistical parametric speech synthesis, voice source modelling

Event

22nd European Signal Processing Conference, EUSIPCO 2014

1/09/145/09/14

Lisbon, United Kingdom

Event: Conference

Download statistics

No data available

ID: 19841872