Semi-supervised discriminative language modeling for Turkish ASR

Arda Çelebi, Hasim Sak, Erinç Dikici, Murat Saraclar, Maider Lehr, Emily Tucker Prud'hommeaux, Puyang Xu, Nathan Glenn, Damianos Karakos, Sanjeev Khudanpur, Brian Roark, Kenji Sagae, Izhak Shafran, Daniel M. Bikel, Chris Callison-Burch, Yuan Cao, Keith B. Hall, Eva Hasler, Philipp Koehn, Adam LopezMatt Post, Darcey Riley

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present our work on semi-supervised learning of discriminative language models where the negative examples for sentences in a text corpus are generated using confusion models for Turkish at various granularities, specifically, word, sub-word, syllable and phone levels. We experiment with different language models and various sampling strategies to select competing hypotheses for training with a variant of the perceptron algorithm. We find that morph-based confusion models with a sample selection strategy aiming to match the error distribution of the baseline ASR system gives the best performance. We also observe that substituting half of the supervised training examples with those obtained in a semi-supervised manner gives similar results.
Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages5025-5028
Number of pages4
ISBN (Electronic)978-1-4673-0044-5
ISBN (Print)978-1-4673-0045-2
DOIs
Publication statusPublished - Mar 2012

Fingerprint

Dive into the research topics of 'Semi-supervised discriminative language modeling for Turkish ASR'. Together they form a unique fingerprint.

Cite this