We describe a speech recogniser which uses a speech production-motivated phonetic-feature description of speech. We argue that this is a natural way to describe the speech signal and offers an efficient intermediate parameterisation for use in speech recognition. We also propose to model this description at the syllable rather than phone level. The ultimate goal of this work is to generate syllable models whose parameters explicitly describe the trajectories of the phonetic features of the syllable. We hope to move away from Hidden Markov Models (HMMs) of context-dependent phone units. As a step towards this, we present a preliminary system which consists of two parts: recognition of the phonetic features from the speech signal using a neural network; and decoding of the feature-based description into phonemes using HMMs.
|Title of host publication||ICSLP `98|
|Subtitle of host publication||5th International Conference on Spoken Language Processing|
|Publisher||International Speech Communication Association|
|Number of pages||4|
|ISBN (Print)||ISSN: 1990-9772|
|Publication status||Published - 1 Dec 1998|