The HOLJ corpus: supporting summarisation of legal texts

Claire Grover, Ben Hachey, Ian Hughson, Buccleuch Place

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct
layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as training
and testing material for a summarisation system based on the work of Teufel and Moens, while the automatic layer of annotation encodes linguistic information as features for a machine learning approach to rhetorical status classification.
Original languageEnglish
Title of host publicationIn Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora
Number of pages7
Publication statusPublished - 2004


Dive into the research topics of 'The HOLJ corpus: supporting summarisation of legal texts'. Together they form a unique fingerprint.

Cite this