Introduction to Probabilistic Models in IR

Victor P. Lavrenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most of today's state-of-the-art retrieval models, including BM25 and language modeling, are grounded in probabilistic principles. Having a working understanding of these principles can help researchers understand existing retrieval models better and also provide industrial practitioners with an understanding of how such models can be applied to real world problems.

This half-day tutorial will cover the fundamentals of two dominant probabilistic frameworks for Information Retrieval: the classical probabilistic model and the language modeling approach. The elements of the classical framework will include the probability ranking principle, the binary independence model, the 2-Poisson model, and the widely used BM25 model. Within language modeling framework, we will discuss various distributional assumptions and smoothing techniques. Special attention will be devoted to the event spaces and independence assumptions underlying each approach. The tutorial will outline several techniques for modeling term dependence and addressing vocabulary mismatch. We will also survey applications of probabilistic models in the domains of cross-language and multimedia retrieval. The tutorial will conclude by suggesting a set of open problems in probabilistic models of IR.

Attendees should have a basic familiarity with probability and statistics. A brief refresher of basic concepts, including random variables, event spaces, conditional probabilities, and independence will be given at the beginning of the tutorial. In addition to slides, some hands on exercises and examples will be used throughout the tutorial.

Original languageEnglish
Title of host publicationSIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL
EditorsHH Chen, EN Efthimiadis, J Savoy, F Crestani, S MarchandMaillet
Place of PublicationNEW YORK
PublisherASSOC COMPUTING MACHINERY
Pages905-905
Number of pages1
ISBN (Print)978-1-60558-896-4
Publication statusPublished - 2010
Event33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Geneva
Duration: 19 Jul 201023 Jul 2010

Conference

Conference33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
CityGeneva
Period19/07/1023/07/10

Cite this