Hierarchical Phrase-Based Translation with Suffix Arrays

Research output: Chapter in Book/Report/Conference proceedingConference contribution


A major engineering challenge in statistical machine translation systems is the efficient representation of extremely large translation rulesets. In phrase-based models, this problem can be addressed by storing the training data in memory and using a suffix array as an efficient index to quickly lookup and extract rules on the fly. Hierarchical phrase-based translation introduces the added wrinkle of source phrases with gaps. Lookup algorithms used for contiguous phrases no longer apply and the best approximate pattern matching algorithms are much too slow, taking several minutes per sentence. We describe new lookup algorithms for hierarchical phrase-based translation that reduce the empirical computation time by nearly two orders of magnitude, making on-the-fly lookup feasible for source phrases with gaps.
Original languageEnglish
Title of host publicationProceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
Place of PublicationPrague, Czech Republic
PublisherAssociation for Computational Linguistics
Number of pages10
Publication statusPublished - 1 Jun 2007


Dive into the research topics of 'Hierarchical Phrase-Based Translation with Suffix Arrays'. Together they form a unique fingerprint.

Cite this