Collective Segmentation and Labeling of Distant Entities in Information Extraction

Charles Sutton, Andrew McCallum

Research output: Chapter in Book/Report/Conference proceedingConference contribution


In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally, a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set, we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.
Original languageEnglish
Title of host publicationICML Workshop on Statistical Relational Learning and Its Connections to Other Fields
Number of pages7
Publication statusPublished - 2004


Dive into the research topics of 'Collective Segmentation and Labeling of Distant Entities in Information Extraction'. Together they form a unique fingerprint.

Cite this