Abstract / Description of output
User quality judgements can show a bewildering amount of variation that is diffcult to capture using traditional quality prediction approaches. Using clustering, an ex- ploratory statistical analysis technique, we reanalysed the data set of a Wizard-of-Oz experiment where 25 users were asked to rate the dialogue after each turn. The sparse data problem was addressed by careful a priori parameter choices and comparison of the results of different cluster algorithms. We found two distinct classes of users, positive and critical. Positive users were generally happy with the dialogue system, and did not mind errors. Critical users downgraded their opinion of the system after errors, used a wider range of ratings, and were less likely to rate the system positively overall. These user groups could not be predicted by experience with spoken dialogue systems, attitude to spoken dialogue systems, a nity with technology, demographics, or short-term memory capacity. We suggest that evaluation research should focus on critical users and discuss how these might be identified.
Original language | English |
---|---|
Title of host publication | Proc. ISCA Workshop Perceptual Quality of Speech Systems, Dresden, Germany |
Place of Publication | Dresden, Germany |
Publication status | Published - 2010 |