Deterministic regular expressions in linear time

Benoît Groz, Sebastian Maneth, Slawek Staworko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Deterministic regular expressions are widely used in XML processing. For instance, all regular expressions in DTDs and XML Schemas are required to be deterministic. In this paper we show that determinism of a regular expression e can be tested in linear time. The best known algorithms, based on the Glushkov automaton, require O(σ|e|) time, where σ is the number of distinct symbols in e. We further show that matching a word w against an expression e can be achieved in combined linear time O(|e|+|w|), for a wide range of deterministic regular expressions: (i) star-free (for multiple input words), (ii) bounded-occurrence, i.e., expressions in which each symbol appears a bounded number of times, and (iii) bounded plus-depth, i.e., expressions in which the nesting depth of alternating plus (union) and concatenation symbols is bounded. Our algorithms use a new structural decomposition of the parse tree of e. For matching arbitrary deterministic regular expressions we present an O(|e| + |w|log log|e|) time algorithm.
Original languageEnglish
Title of host publicationProceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20-24, 2012
PublisherACM
Pages49-60
Number of pages12
ISBN (Electronic)978-1-4503-1248-6
DOIs
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Deterministic regular expressions in linear time'. Together they form a unique fingerprint.

Cite this