Spatially Prioritized and Persistent Text Detection and Decoding

Hsueh-Cheng Wang, Yafim Landa, Maurice Fallon, Seth Teller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We show how to exploit temporal and spatial coherence to achieve efficient and effective text detection and decoding for a sensor suite moving through an environment in which text occurs at a variety of locations, scales and orientations with respect to the observer. Our method uses simultaneous localization and mapping (SLAM) to extract planar “tiles” representing scene surfaces. Multiple observations of each tile, captured from different observer poses, are aligned using homography transformations. Text is detected using Discrete Cosine Transform (DCT) and Maximally Stable Extremal Regions (MSER), and decoded by an Optical Character Recognition (OCR) engine. The decoded characters are then clustered into character blocks to obtain an MLE word configuration. This paper’s contributions include: (1) spatiotemporal fusion of tile observations via SLAM, prior to inspection, thereby improving the quality of the input data; and (2) combination of multiple noisy text observations into a single higher-confidence estimate of environmental text.
Original languageEnglish
Title of host publicationCamera-Based Document Analysis and Recognition
Subtitle of host publication5th International Workshop, CBDAR 2013, Washington, DC, USA, August 23, 2013, Revised Selected Papers
EditorsMasakazu Iwamura, Faisal Shafait
PublisherSpringer International Publishing
Pages3-17
Number of pages15
ISBN (Electronic)978-3-319-05167-3
ISBN (Print)978-3-319-05166-6
DOIs
Publication statusPublished - 2014

Publication series

NameLecture Notes in Computer Science
PublisherSpringer International Publishing
Volume8357
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • SLAM
  • Text detection
  • Video OCR
  • Multiple frame integration
  • DCT
  • MSER
  • Lexicon
  • Language model

Fingerprint

Dive into the research topics of 'Spatially Prioritized and Persistent Text Detection and Decoding'. Together they form a unique fingerprint.

Cite this