Building Test Collection from Old IR Literature

Anirban Chakraborty, Kripabandhu Ghosh, Swapan Kumar Parui

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Standard test collections form the very basis of Information Retrieval research and evaluation. Important datasets have been created to promote empirical research and experimentation. In this paper, we describe our endeavour in creating a test collection from old, archived writings of IR stalwarts. The documents are created in text format from the scanned and OCRed version. The test collection consists of a set of documents in TREC format along with a set of expert queries and their relevance assessments. This dataset, though small in size, would be of paramount interest for researchers and students of IR since it contains valuable discourses on the discipline from its very inception. Also, to the best of our knowledge, no standard IR dataset has been built so far comprising old research articles. Furthermore, this is a dataset without the original error-free digital text version. So, the resulting collection would expect researchers to run retrieval experiments on the erroneous collection without the scope of error modeling. This would invite new research ideas.
Original languageEnglish
Title of host publicationProceedings of the Forum for Information Retrieval Evaluation
EditorsPrasenjit Majumder, Mandar Mitra, Sukomal Pal, Madhulika Agrawal, Parth Mehta
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery (ACM)
Number of pages5
ISBN (Print)9781450337557
Publication statusPublished - 5 Dec 2014
Event6th workshop of the Forum for Information Retrieval Evaluation - Bangalore, India
Duration: 5 Dec 20147 Dec 2014
Conference number: 6

Publication series

NameFIRE '14
PublisherAssociation for Computing Machinery


Workshop6th workshop of the Forum for Information Retrieval Evaluation
Abbreviated titleFIRE 2014

Keywords / Materials (for Non-textual outputs)

  • Test Collection
  • Old Literature
  • OCR Errors


Dive into the research topics of 'Building Test Collection from Old IR Literature'. Together they form a unique fingerprint.

Cite this