Edinburgh Research Explorer

Information extraction from broadcast news

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)1295-1310
JournalPhilosophical Transactions A: Mathematical, Physical and Engineering Sciences
Volume358
Issue number1769
DOIs
Publication statusPublished - 15 Apr 2000

Abstract

This paper discusses the development of trainable statistical models for extracting content from television and radio news broadcasts. In particular, we concentrate on statistical finite–state models for identifying proper names and other named entities in broadcast speech. Two models are presented: the first represents name class information as a word attribute; the second represents both word–word and class–class transitions explicitly. A common n–gram–based formulation is used for both models. The task of named–entity identification is characterized by relatively sparse training data, and issues related to smoothing are discussed. Experiments are reported using Hub–4E the DARPA/NIST evaluation for North American broadcast news

Download statistics

No data available

ID: 27448815