Speech synthesis that improves through adaptive learning
This project aims to develop an original speech synthesis technology that learns from data with little or no expert supervision and continually improves itself, simply by being used.
The speech synthesis system will be portable to new languages with minimal effort; it will be first developed for the 4 languages of the consortium partners (English, Finish, Spanish, Romanian) and then extended to at least 6 additional ones. The system will also make it easy for non-expert users to create new voices and indeed entire systems in new languages. All developments will be made available to the community under Open Source licenses.
Target Group of the project
The target group are application developers who need to include speech generation in their system, for instance telecommunication companies or games developers.
Objectives and Innovation
The project aims to enable speech generation in new languages and with new voices with only minimal effort. One innovation is the use of a statistical data-driven approach in the text processing component, making it learnable from data. Another innovation is the use of machine learning techniques to improve the quality of the output speech based on user feedback, and the adaptation of the speaking style to the genre of the text (i.e., the type of content).
Results of the project
The main result of the project is a highly adaptable and portable complete speech synthesis system which can be used for a wide variety of languages and domains. Project partners will use models and algorithms which enable every component of a speech synthesiser to be learned from data, with little or minimal supervision, and which enable learning to continue whilst the system is in use. Those models will be flexible and capable of producing a range of speaking styles, including expressive, conversational, and highly-intelligible speech.
Impact (scientific, technical, socio-economical)
Improving the usability of speech generation by making the speech sound more natural, extending the use of speech generation to lesser resourced languages and large numbers of niche domains, by reducing the development costs of new speech generation systems.