Projects per year
Abstract / Description of output
The performance of AllReduce is crucial at scale. The recursive doubling with pairwise exchange algorithm theoretically achieves O(log2N) scaling for short messages with N peers, but is limited by improvements in network latency. A multi-way exchange can be implemented using message pipelining, which is easier to improve than latency. Using our method, recursive multiplying, we show reductions in execution time of between 8% and 40% of AllReduce on a Cray XC30 over recursive doubling. Using a custom simulator we further explore the dynamics of recursive multiplying.
Original language | English |
---|---|
Pages (from-to) | 24-44 |
Number of pages | 21 |
Journal | Parallel Computing: Systems & Applications |
Early online date | 18 Aug 2017 |
DOIs | |
Publication status | E-pub ahead of print - 18 Aug 2017 |
Keywords / Materials (for Non-textual outputs)
- AllReduce
- MPI
- Scalability
- Collective
- Recursive Doubling
- n-way
- Message Pipelining
Fingerprint
Dive into the research topics of 'Generalisation of Recursive Doubling for AllReduce: Now with Simulation'. Together they form a unique fingerprint.Projects
- 1 Finished
Profiles
-
Mark Bull
- Computer Systems
- EPCC - Senior Research Fellow
Person: Academic: Research Active , Academic: Research Active (Research Assistant)