Patent Query Reduction Using Pseudo Relevance Feedback

Debasis Ganguly, Johannes Leveling, Walid Magdy, Gareth J.F. Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Queries in patent prior art search are full patent applications and much longer than standard ad hoc search and web search topics. Standard information retrieval (IR) techniques are not entirely effective for patent prior art search because of ambiguous terms in these massive queries. Reducing patent queries by extracting key terms has been shown to be ineffective mainly because it is not clear what the focus of the query is. An optimal query reduction algorithm must thus seek to retain the useful terms for retrieval favouring recall of relevant patents, but remove terms which impair IR effectiveness. We propose a new query reduction technique decomposing a patent application into constituent text segments and computing the Language Modeling (LM) similarities by calculating the probability of generating each segment from the top ranked documents. We reduce a patent query by removing the least similar segments from the query, hypothesising that removal of these segments can increase the precision of retrieval, while still retaining the useful context to achieve high recall. Experiments on the patent prior art search collection CLEF-IP 2010 show that the proposed method outperforms standard pseudo-relevance feedback (PRF) and a naive method of query reduction based on removal of unit frequency terms (UFTs).
Original languageEnglish
Title of host publicationProceedings of the 20th ACM International Conference on Information and Knowledge Management
Place of PublicationNew York, NY, USA
Number of pages4
ISBN (Print)978-1-4503-0717-8
Publication statusPublished - 2011

Publication series

NameCIKM '11


Dive into the research topics of 'Patent Query Reduction Using Pseudo Relevance Feedback'. Together they form a unique fingerprint.

Cite this