Arabic OCR Error Correction Using Character Segment Correction, Language Modeling, and Shallow Morphology

Walid Magdy, Kareem Darwish

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

This paper explores the use of a character segment based character correction model, language modeling, and shallow morphology for Arabic OCR error correction. Experimentation shows that character segment based correction is superior to single character correction and that language modeling boosts correction, by improving the ranking of candidate corrections, while shallow morphology had a small adverse effect. Further, given sufficiently large corpus to extract a dictionary and to train a language model, word based correction works well for a morphologically rich language such as Arabic.
Original languageEnglish
Title of host publicationEMNLP 2007, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, 22-23 July 2006, Sydney, Australia
PublisherAssociation for Computational Linguistics (ACL)
Pages408-414
Number of pages7
Publication statusPublished - 2006

Fingerprint

Dive into the research topics of 'Arabic OCR Error Correction Using Character Segment Correction, Language Modeling, and Shallow Morphology'. Together they form a unique fingerprint.

Cite this