On the trade-off between redundancy and cohesiveness in extractive summarization

Ronald Cardenas, Matthias Gallé, Shay B. Cohen

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Extractive summaries are usually presented as lists of sentences with no expected cohesion between them and with plenty of redundant information if not accounted for. In this paper, we investigate the trade-offs incurred when aiming to control for inter-sentential cohesion and redundancy in extracted summaries, and their impact on their informativeness. As case study, we focus on the summarization of long, highly redundant documents and consider two optimization scenarios, reward-guided and with no supervision. In the reward-guided scenario, we compare systems that control for redundancy and cohesiveness during sentence scoring. In the unsupervised scenario, we introduce two systems that aim to control all three properties --informativeness, redundancy, and cohesiveness-- in a principled way. Both systems implement a psycholinguistic theory that simulates how humans keep track of relevant content units and how cohesiveness and non-redundancy constraints are applied in short-term memory during reading. Extensive automatic and human evaluations reveal that systems optimizing for --among other properties-- cohesiveness are capable of better organizing content in summaries compared to systems that optimize only for redundancy, while maintaining comparable informativeness. We find that the proposed unsupervised systems manage to extract highly cohesive summaries across varying levels of document redundancy, although sacrificing informativeness in the process. Finally, we lay evidence as to how simulated cognitive processes impact the trade-off between the analysed summary properties.
Original languageEnglish
Pages (from-to)273-326
Number of pages54
JournalJournal of Artificial Intelligence Research
Publication statusPublished - 6 Jun 2024

Keywords / Materials (for Non-textual outputs)

  • machine learning
  • cognitive modeling
  • natural language
  • reinforcement learning


Dive into the research topics of 'On the trade-off between redundancy and cohesiveness in extractive summarization'. Together they form a unique fingerprint.

Cite this