Making Test Corpora for Question Answering More Representative

Andrew Walker, Andrew Starkey, Jeff Z. Pan, Advaith Siddharthan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.
Original languageEnglish
Title of host publicationInformation Access Evaluation. Multilinguality, Multimodality, and Interaction
Subtitle of host publication5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, UK, September 15-18, 2014. Proceedings
EditorsEvangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, Elaine Toms
Place of PublicationCham
PublisherSpringer
Pages1-6
Number of pages6
ISBN (Electronic)978-3-319-11382-1
ISBN (Print)978-3-319-11381-4
DOIs
Publication statusPublished - 18 Sept 2014
Event2014 Cross Language Evaluation Forum Conference, CLEF 2014 - Sheffield, United Kingdom
Duration: 15 Sept 201418 Sept 2014

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2014 Cross Language Evaluation Forum Conference, CLEF 2014
Country/TerritoryUnited Kingdom
CitySheffield
Period15/09/1418/09/14

Fingerprint

Dive into the research topics of 'Making Test Corpora for Question Answering More Representative'. Together they form a unique fingerprint.

Cite this