We explore the problem of retrieving semi-structured documents from a real-world collection using a structured query. We formally develop Structured Relevance Models (SRM), a retrieval model that is based on the idea that plausible values for a given field could be inferred from the context provided by the other fields in the record. We then carry out a set of experiments using a snapshot of the National Science Digital Library (NSDL) repository, and queries that only mention fields missing from the test data. For such queries, typical field matching would retrieve no documents at all. In contrast, the SRM approach achieves a mean average precision of over twenty percent.
|Title of host publication||Human Language Technology 2007|
|Subtitle of host publication||The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference April 22-27, 2007, Rochester, New York, USA|
|Number of pages||8|
|Publication status||Published - Apr 2007|