The objective of the project was to produce a synthetic voice without
recource to phonological and phonetic analysis that is required for
creating a traditionl phone set and a pronunication lexicon. The input
to the system would be speech recorded from a single speaker, divided
up into utterances, each utterance to be accompanied by a normalised
text transcription. No language specific techniques would be applied.
The only restriction we required was that the orthographic system of
the language was syllabic or alphabetic and not logographic or
logophonetic.
Speech synthesis is the conversion of text to speech by computer. Conventional methods for doing this require a lot of pre-existing knowledge to be brought into play, such as a large pronunciation dictionary and knowledge about the speech sounds that make up the language. This makes it very expensive to build a system for any language that does not have these resources available: that means almost all the languages of the world. This project aimed to improve this situation, by creating a speech synthesis systems that do not require knowledge of the sound system or require a pronunciation dictionary.
The project succeeded in producing both statistical parametric and unit selection "emergent phone" systems. In addition we also created orthographic unit-based systems. These systems were evaluated against classical phone systems. A large number of techniques were evaluated for generating these systems and applied to the two main underlying problems that needed to
be solved namely:
* Segmenting and categorising units of speech.
* Generalising the expected units in unseen words from a database of segmented and categorised speech from a single speaker.
The main advance made in the project was the use of orthographic unit-based systems (i.e., using graphemes instead of phonemes), and this notion has been followed up in subsequent projects.
Acronym | ePhones |
---|
Status | Finished |
---|
Effective start/end date | 1/07/06 → 30/09/10 |
---|