Wikimarks: Harvesting Relevance Benchmarks from Wikipedia

Laura Dietz, Shubham Chatterjee, Connor Lennox, Sumanta Kashyapi, Pooja Oza, Ben Gamari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We provide a resource for automatically harvesting relevance benchmarks from Wikipedia - which we refer to as "Wikimarks"to differentiate them from manually created benchmarks. Unlike simulated benchmarks, they are based on manual annotations of Wikipedia authors. Studies on the TREC Complex Answer Retrieval track demonstrated that leaderboards under Wikimarks and manually annotated benchmarks are very similar. Because of their availability, Wikimarks can fill an important need for Information Retrieval research. We provide a meta-resource to harvest Wikimarks for several information retrieval tasks across different languages: paragraph retrieval, entity ranking, query-specific clustering, outline prediction, and relevant entity linking and many more. In addition, we provide example Wikimarks for English, Simple English, and Japanese derived from the 01/01/2022 Wikipedia dump. Resource available: https: //trema-unh.github.io/wikimarks/

Original languageEnglish
Title of host publicationSIGIR 2022 - Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherACM
Pages3003-3012
Number of pages10
ISBN (Electronic)9781450387323
DOIs
Publication statusPublished - 6 Jul 2022
Event45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022 - Madrid, Spain
Duration: 11 Jul 202215 Jul 2022

Conference

Conference45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022
Country/TerritorySpain
CityMadrid
Period11/07/2215/07/22

Keywords / Materials (for Non-textual outputs)

  • query-specific clustering
  • relevant entity linking
  • test collections

Fingerprint

Dive into the research topics of 'Wikimarks: Harvesting Relevance Benchmarks from Wikipedia'. Together they form a unique fingerprint.

Cite this