Word, Syllable and Phoneme Based Metrics Do Not Correlate with Human Performance in ASR-Mediated Tasks

AnneH. Schneider, Johannes Hellrich, Saturnino Luz

Research output: Chapter in Book/Report/Conference proceedingOther chapter contribution


Automatic evaluation metrics should correlate with human judgement. We collected sixteen ASR mediated dialogues using a map task scenario. The material was assessed extrinsically (i.e. in context) through measures like time to task completion and intrinsically (i.e. out of context) using the word error rate and several variants thereof, which are based on smaller units. Extrinsic and intrinsic results did not correlate, neither for word error rate nor for metrics based on characters, syllables or phonemes.
Original languageUndefined/Unknown
Title of host publicationAdvances in Natural Language Processing
EditorsAdam Przepiórkowski, Maciej Ogrodniczuk
Number of pages8
ISBN (Print)978-3-319-10887-2
Publication statusPublished - 2014

Publication series

NameLecture Notes in Computer Science

Cite this