A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages

Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang, Maverick Alzate

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We present Arab-Acquis, a large publicly available dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words. The corpus follows previous data splits in the literature for tuning, development, and testing. We describe the corpus and how it was created. We also present the first benchmarking results on translating to and from Arabic for 22 European languages.
Original languageEnglish
Title of host publicationProceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Place of PublicationValencia, Spain
PublisherAssociation for Computational Linguistics
Number of pages7
Publication statusPublished - 7 Apr 2017
Event15th EACL 2017 Software Demonstrations - Valencia, Spain
Duration: 3 Apr 20177 Apr 2017


Conference15th EACL 2017 Software Demonstrations
Abbreviated titleEACL 2017
Internet address


Dive into the research topics of 'A Parallel Corpus for Evaluating Machine Translation between Arabic and European Languages'. Together they form a unique fingerprint.

Cite this