Should MT Systems Be Used as Black Boxes in CLIR?

Walid Magdy, Gareth J. F. Jones

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The translation stage in cross language information retrieval (CLIR) acts as the main enabling stage to cross the language barrier between documents and queries. In recent years machine translation (MT) systems have become the dominant approach to translation in CLIR. However, unlike information retrieval (IR), MT focuses on the morphological and syntactical quality of the sentence. This requires large training resources and high computational power for training and translation. We present a novel technique for MT designed specifically for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this pre-processing step is found to significantly speed up the translation process without affecting the retrieval quality.

This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (CNGL) project at Dublin City University.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings
PublisherSpringer Berlin Heidelberg
Pages683-686
Number of pages4
ISBN (Electronic)978-3-642-20161-5
ISBN (Print)978-3-642-20160-8
DOIs
Publication statusPublished - 2011

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer Berlin Heidelberg
Volume6611
ISSN (Print)0302-9743

Fingerprint

Dive into the research topics of 'Should MT Systems Be Used as Black Boxes in CLIR?'. Together they form a unique fingerprint.

Cite this