TY - GEN
T1 - Disfluency and speech recognition profile factors
AU - Aylett, Matthew P.
PY - 2003/9/1
Y1 - 2003/9/1
N2 - This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.
AB - This paper reports on work bringing together disfluency coding carried out by Lickley [1] and recognition work carried out as part of the ERF project (Bard, Thompson Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word laffice. These factors can be grouped as: Entropy Factors - the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors - the number of non-unique and unique arcs in the word lattice in any given 1 Oms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point.
M3 - Conference contribution
T3 - Gothenburg Papers in Theoretical Linguistics
SP - 51
EP - 54
BT - Proceedings of DiSS’03, Disfluency in Spontaneous Speech Workshop, 5–8 September 2003, Göteborg University, Sweden.
A2 - Eklund, Robert
PB - ISCA
T2 - Disfluency in Spontaneous Speech (DiSS'03)
Y2 - 5 September 2003 through 8 September 2003
ER -