Generalisation of Recursive Doubling for AllReduce: Now with Simulation

Martin Ruefenacht, Mark Bull, Stephen Booth

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling. Using a custom simulator we further explore the dynamics of recursive multiplying.
Original languageEnglish
Pages (from-to)24-44
Number of pages21
JournalParallel Computing: Systems & Applications
Early online date18 Aug 2017
DOIs
Publication statusE-pub ahead of print - 18 Aug 2017

Keywords / Materials (for Non-textual outputs)

  • AllReduce
  • MPI
  • Scalability
  • Collective
  • Recursive Doubling
  • n-way
  • Message Pipelining

Fingerprint

Dive into the research topics of 'Generalisation of Recursive Doubling for AllReduce: Now with Simulation'. Together they form a unique fingerprint.

Cite this