The Utility of Information Extraction in the Classification of Books

Tom Betts, Maria Milosavljevic, Jon Oberlander

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe work on automatically assigning classification labels to books using the Library of Congress Classification scheme. This task is non-trivial due to the volume and variety of books that exist. We explore the utility of Information Extraction (IE) techniques within this text categorisation (TC) task, automatically extracting structured information from the full text of books. Experimental evaluation of performance involves a corpus of books from Project Gutenberg. Results indicate that a classifier which combines methods and tools from IE and TC significantly improves over a state-of-the-art text classifier, achieving a classification performance of F β = 1 = 0.8099.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication29th European Conference on IR Research, ECIR 2007, Rome, Italy, April 2-5, 2007. Proceedings
EditorsGiambattista Amati, Claudio Carpineto, Giovanni Romano
Place of PublicationRome
PublisherSpringer Berlin Heidelberg
Pages295-306
Number of pages12
ISBN (Electronic)978-3-540-71496-5
ISBN (Print)978-3-540-71494-1
DOIs
Publication statusPublished - 2007

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Berlin Heidelberg
Volume4425
ISSN (Print)0302-9743

Keywords

  • Information Extraction
  • Named Entity Recognition
  • Book Categorisation
  • Project Gutenberg
  • Ontologies
  • Digital Libraries

Fingerprint Dive into the research topics of 'The Utility of Information Extraction in the Classification of Books'. Together they form a unique fingerprint.

Cite this