The trajectory HMM has been shown to be useful for model-based speech synthesis where a smoothed trajectory is generated using temporal constraints imposed by dynamic features. To evaluate the performance of such model on an ASR task, we present a trajectory decoder based on tree search with delayed path merging. Experiment on a speaker-dependent phone recognition task using the MOCHA-TIMIT database shows that the MLE-trained trajectory model, while retaining attractive properties of being a proper generative model, tends to favour over-smoothed trajectory among competing hypothesises, and does not perform better than a conventional HMM. We use this to build an argument that models giving better fit on training data may suffer a reduction of discrimination by being too faithful to training data. This partially explains why alternative acoustic models that try to explicitly model temporal constraints do not achieve significant improvements in ASR.
|Title of host publication||Interspeech 2006 - ICSLP|
|Subtitle of host publication||Ninth International Conference on Spoken Language Processing, Proceedings of the|
|Publication status||Published - 2006|
|Event||Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006 - ICSLP) - Pittsburgh, PA, United States|
Duration: 17 Sep 2006 → 21 Sep 2006
|Conference||Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006 - ICSLP)|
|Period||17/09/06 → 21/09/06|