Hallucinated n-best lists for discriminative language modeling

Kenji Sagae, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Murat Saraclar, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam Lopez, Matt Post, Darcey Riley

Research output: Chapter in Book/Report/Conference proceedingConference contribution


This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are “hallucinated” for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with “real” n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages4
ISBN (Electronic)978-1-4673-0044-5
ISBN (Print)978-1-4673-0045-2
Publication statusPublished - 2012


Dive into the research topics of 'Hallucinated n-best lists for discriminative language modeling'. Together they form a unique fingerprint.

Cite this