Edinburgh Research Explorer

Basic Gene Grammars and DNA-ChartParser for language processing of Escherichia coli promoter DNA sequences

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)226-236
Number of pages11
JournalBioinformatics
Volume17
Issue number3
DOIs
Publication statusPublished - 2001

Abstract

Motivation: The field of ‘DNA linguistics’ has emerged from pioneering work in computational linguistics and molecular biology. Most formal grammars in this field are expressed using Definite Clause Grammars but these have computational limitations which must be overcome. The present study provides a new DNA parsing system, comprising a logic grammar formalism called Basic Gene Grammars and a bidirectional chart parser DNA-ChartParser.

Results: The use of Basic Gene Grammars is demonstrated in representing many formulations of the knowledge of Escherichia coli promoters, including knowledge acquired from human experts, consensus sequences, statistics (weight matrices), symbolic learning, and neural network learning. The DNA-ChartParser provides bidirectional parsing facilities for BGGs in handling overlapping categories, gap categories, approximate pattern matching, and constraints. Basic Gene Grammars and the DNA-ChartParser allowed different sources of knowledge for recognizing E.coli promoters to be combined to achieve better accuracy as assessed by parsing these DNA sequences in real-world data sets.

ID: 3505553