Character-based Surprisal as a Model of Reading Difficulty in the Presence of Errors

Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Intuitively, human readers cope easily with errors in text; typos, misspelling, word substitutions, etc. do not unduly disrupt natural reading. Previous work indicates that letter transpositions result in increased reading times, but it is unclear if this effect generalizes to more natural errors. In this paper, we report an eye-tracking study that compares two error types (letter transpositions and naturally occurring misspelling) and two error rates (10% or 50% of all words contain errors). We find that human readers show unimpaired comprehension in spite of these errors, but error words cause more reading difficulty than correct words. Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones. We then present a computational model that uses character-based (rather than traditional word-based) surprisal to account for these results. The model explains that transpositions are harder than misspellings because they contain unexpected letter combinations. It also explains the error rate effect: expectations about upcoming words are harder to compute when the context is degraded, leading to increased surprisal.
Original languageEnglish
Title of host publicationProceedings of the 41st Annual Conference of the Cognitive Science Society
Subtitle of host publicationMontreal 2019
EditorsAshok Goel, Colleen Seifert, Christian Freksa
PublisherCognitive Science Society
Number of pages7
ISBN (Print)0-9911967-7-5
Publication statusPublished - 24 Jul 2019
Event41st Annual Meeting of the Cognitive Science Society - Palais des Congrès de Montréal, Montréal , Canada
Duration: 24 Jul 201927 Jul 2019
Conference number: 41


Conference41st Annual Meeting of the Cognitive Science Society
Abbreviated titleCOGSCI 2019
Internet address


  • himan reading
  • eye-tracking
  • errors
  • computational modeling
  • surprisal
  • neural networks


Dive into the research topics of 'Character-based Surprisal as a Model of Reading Difficulty in the Presence of Errors'. Together they form a unique fingerprint.

Cite this