Repeatable and reliable search system evaluation using crowdsourcing

Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran Duc

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.
Original languageEnglish
Title of host publicationProceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Place of PublicationNew York, NY, USA
PublisherACM
Pages923-932
Number of pages10
ISBN (Print)978-1-4503-0757-4
DOIs
Publication statusPublished - 2011

Publication series

NameSIGIR '11
PublisherACM

Fingerprint

Dive into the research topics of 'Repeatable and reliable search system evaluation using crowdsourcing'. Together they form a unique fingerprint.

Cite this