Discriminitive models for sequences and trees---such as linear-chain conditional random fields (CRFs) and max-margin parsing---have shown great promise because they combine the ability to incorporate arbitrary input features and the benefits of principled global inference over their structured outputs. However, since parameter estimation in these models involves repeatedly performing this global inference, training can be very slow. We present piecewise training, a new training method that combines the speed of local training with the accuracy of global training by incorporating a limited amount of global information derived from previous errors of the model. On named-entity and part-of-speech data, we show that our new method not only trains in less than one-fifth the time of a CRF and yields improved accuracy over the MEMM, but surprisingly also provides a statistically-significant gain in accuracy over the CRF. Also, we present preliminary results showing a potential application to efficient training of discriminative parsers.
|Name||Center for Intelligent Information Retrieval Technical Reports|