Finding syntactic structure in unparsed corpora: The Gsearch corpus query system.

S Corley, M Corley, F Keller, M W Crocker, S Trewin

Research output: Contribution to journalArticlepeer-review

Abstract

The Gsearch system allows the selection of sentences by syntactic criteria from text corpora, even when these corpora contain no prior syntactic markup. This is achieved by means of a fast chart parser, which takes as input a grammar and a search expression specified by the user. Gsearch features a modular architecture that can be extended straightforwardly to give access to new corpora. The Gsearch architecture also allows interfacing with external linguistic resources (such as taggers and lexical databases). Gsearch can be used with graphical tools for visualizing the results of a query.

Original languageEnglish
Pages (from-to)81-94
Number of pages14
JournalComputers and the humanities
Volume35
Issue number2
Publication statusPublished - May 2001

Keywords / Materials (for Non-textual outputs)

  • corpus search
  • parsing
  • syntactic annotation
  • SGML
  • computational linguistics
  • psycholinguistics

Fingerprint

Dive into the research topics of 'Finding syntactic structure in unparsed corpora: The Gsearch corpus query system.'. Together they form a unique fingerprint.

Cite this