Acoustic Data-driven Pronunciation Lexicon for Large Vocabulary Speech Recognition

Liang Lu, Arnab Ghoshal, Steve Renals

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Speech recognition systems normally use handcrafted pronunciation lexicons designed by linguistic experts. Building and maintaining such a lexicon is expensive and time consuming. This paper concerns automatically learning a pronunciation lexicon for speech recognition. We assume the availability of a small seed lexicon and then learn the pronunciations of new words directly from speech that is transcribed at word-level. We present two implementations for refining the putative pronunciations of new words based on acoustic evidence. The first one is an expectation maximization (EM) algorithm based on weighted finite state transducers (WFSTs) and the other is its Viterbi approximation. We carried out experiments on the Switchboard corpus of conversational telephone speech. The expert lexicon has a size of more than 30,000 words, from which we randomly selected 5,000 words to form the seed lexicon. By using the proposed lexicon learning method, we have significantly improved the accuracy compared with a lexicon learned using a grapheme-to-phoneme transformation, and have obtained a word error rate that approaches that achieved using a fully handcrafted lexicon.
Original languageEnglish
Title of host publicationAutomatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
ISBN (Print)978-1-4799-2756-2
Publication statusPublished - 2013


Dive into the research topics of 'Acoustic Data-driven Pronunciation Lexicon for Large Vocabulary Speech Recognition'. Together they form a unique fingerprint.

Cite this