Length constraints impose implicit requirements on the type of content that can be included in a text. Here we propose the first model to computationally assess if a text deviates from these requirements. Specifically, our model predicts the appropriate length for texts based on content types present in a snippet of constant length. We consider a range of features to approximate content type, including syntactic phrasing, constituent compression probability, presence of named entities, sentence specificity and intersentence continuity. Weights for these features are learned using a corpus of summaries written by experts and on high quality journalistic writing. During test time, the difference between actual and predicted length allows us to quantify text verbosity. We use data from manual evaluation of summarization systems to assess the verbosity scores produced by our model. We show that the automatic verbosity scores are significantly negatively correlated with manual content quality scores given to the summaries.
|Title of host publication||Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics|
|Place of Publication||Gothenburg, Sweden|
|Publisher||Association for Computational Linguistics|
|Number of pages||9|
|Publication status||Published - 1 Apr 2014|