In order to summarize a document, it is often useful to have a background set of documents from the domain to serve as a reference for determining new and important information in the input document. We present a model based on Bayesian surprise which provides an intuitive way to identify surprising information from a summarization input with respect to a background corpus. Specifically, the method quantifies the degree to which pieces of information in the input change one’s beliefs’ about the world represented in the background. We develop systems for generic and update summarization based on this idea. Our method provides competitive content selection performance with particular advantages in the update task where systems are given a small and topical background corpus.
|Title of host publication||Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)|
|Place of Publication||Baltimore, Maryland|
|Publisher||Association for Computational Linguistics|
|Number of pages||6|
|Publication status||Published - 1 Jun 2014|