Work-stealing prefix scan: Addressing load imbalance in large-scale image registration

Marcin Copik, Tobias Grosser, Torsten Hoefler, Paolo Bientinesi, Benjamin Berkels

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this paper, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.
Original languageEnglish
Number of pages13
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number3
Early online date7 Jul 2021
Publication statusPublished - 1 Mar 2022

Keywords / Materials (for Non-textual outputs)

  • prefix sum
  • parallel algorithms
  • work stealing
  • load balancing
  • image registration


Dive into the research topics of 'Work-stealing prefix scan: Addressing load imbalance in large-scale image registration'. Together they form a unique fingerprint.

Cite this