Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under Length Constraints

Annie Louis, Ani Nenkova

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Length constraints impose implicit requirements on the type of content that can be included in a text. Here we propose the first model to computationally assess if a text deviates from these requirements. Specifically, our model predicts the appropriate length for texts based on content types present in a snippet of constant length. We consider a range of features to approximate content type, including syntactic phrasing, constituent compression probability, presence of named entities, sentence specificity and intersentence continuity. Weights for these features are learned using a corpus of summaries written by experts and on high quality journalistic writing. During test time, the difference between actual and predicted length allows us to quantify text verbosity. We use data from manual evaluation of summarization systems to assess the verbosity scores produced by our model. We show that the automatic verbosity scores are significantly negatively correlated with manual content quality scores given to the summaries.
Original languageEnglish
Title of host publicationProceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics
Place of PublicationGothenburg, Sweden
PublisherAssociation for Computational Linguistics
Pages636-644
Number of pages9
Publication statusPublished - 1 Apr 2014

Cite this