Edinburgh Research Explorer

Variable word rate N-grams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Original languageEnglish
Title of host publicationProceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing
Subtitle of host publicationICASSP '00
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1591-1594
Volume3
ISBN (Print)0-7803-6293-4
DOIs
Publication statusPublished - 2000
Event2000 IEEE International Conference! on Acoustics, Speech, and Signal Processing - Hilton Hotel and Convention Center, Istanbul, Turkey
Duration: 5 Jun 20009 Jun 2000

Conference

Conference2000 IEEE International Conference! on Acoustics, Speech, and Signal Processing
CountryTurkey
CityIstanbul
Period5/06/009/06/00

Abstract

The rate of occurrence of words is not uniform but varies from document to document. Despite this observation, parameters for conventional N-gram language models are usually derived using the assumption of a constant word rate. In this paper we investigate the use of variable word rate assumption, modelled by a Poisson distribution or a continuous mixture of Poissons. We present an approach to estimating the relative frequencies of words or N-grams taking prior information of their occurrences into account. Discounting and smoothing schemes are also considered. Using the Broadcast News task, the approach demonstrates a reduction of perplexity up to 10%

Event

Download statistics

No data available

ID: 27450459