Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval

Rui Yan, Han Jiang, Mirella Lapata, Shou-De Lin, Xueqiang Lv, Xiaoming Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Work on information retrieval has shown that language model smoothing leads to more accurate estimation of document models and hence is crucial for achieving good retrieval performance. Several smoothing methods have been proposed in the literature, using either semantic or positional information. In this paper, we propose a unified proximity-based framework to smooth language models, leveraging semantic and positional information simultaneously in combination. The key idea is to project terms to positions where they originally do not exist (i.e., zero count), which is actually a word count propagation process. We achieve this projection through two proximity-based density functions indicating semantic association and positional adjacency. We balance the effects of semantic and positional smoothing, and score a document based on the smoothed language model. Experiments on four standard TREC test collections show that our smoothing model is effective for information retrieval and generally performs better than the state of the art.
Original languageEnglish
Title of host publicationSixth International Joint Conference on Natural Language Processing
Pages507-515
Number of pages9
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval'. Together they form a unique fingerprint.

Cite this