Abstract
In information extraction, we often wish to identify all mentions of an entity, such as a person or organization. Traditionally, a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example, if the same word is used multiple times, it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set, we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.
Original language | English |
---|---|
Title of host publication | ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields |
Number of pages | 7 |
Publication status | Published - 2004 |