Constructing a translation lexicon from comparable, non-parallel corpora

Daniel Marcu (Inventor), Kevin Knight (Inventor), Dragos Stefan Munteanu (Inventor), Philipp Koehn (Inventor)

Research output: Patent

Abstract / Description of output

A machine translation system may use non-parallel monolingual corpora to generate a translation lexicon. The system may identify identically spelled words in the two corpora, and use them as a seed lexicon. The system may use various clues, e.g., context and frequency, to identify and score other possible translation pairs, using the seed lexicon as a basis. An alternative system may use a small bilingual lexicon in addition to non-parallel corpora to learn translations of unknown words and to generate a parallel corpus.
Original languageEnglish
Patent numberUS7620538 B2
Priority date26/03/02
Publication statusPublished - 17 Nov 2009


Dive into the research topics of 'Constructing a translation lexicon from comparable, non-parallel corpora'. Together they form a unique fingerprint.

Cite this