Fast Approximate String Matching with Suffix Arrays and A* Parsing

Philipp Koehn, Jean Senellart

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a factor of 100, with average lookup times of 4.3–247ms for a segment in a realistic scenario.
Original languageEnglish
Title of host publicationProceedings for The Ninth Conference of the Association for Machine Translation in the Americas
PublisherAssociation for Machine Translation in the Americas, AMTA
Number of pages9
Publication statusPublished - 2010

Fingerprint

Dive into the research topics of 'Fast Approximate String Matching with Suffix Arrays and A* Parsing'. Together they form a unique fingerprint.

Cite this