Abstract / Description of output
To address NUMA performance anomalies, programmers often resort to application specific optimizations that are not transferable to other programs, or to generic optimizations that do not perform well in all cases. Skeleton based programming models allow NUMA optimizations to be abstracted on a pattern-by-pattern basis, freeing programmers from this complexity. As a case study, we investigate computations that can be implemented with stencil skeletons. We present an analysis of the behavior of a range of simple and complex stencil programs from the NAS and Rodinia benchmark suites, under state-of-the-art NUMA aware page placement (PP) schemes. We show that even though an application (or skeleton) may have implemented the correct, intuitive scheduling of data and work to threads, the resulting performance can be disrupted by an inappropriate PP scheme. In contrast, we show that a NUMA PP-aware stencil implementation scheme can achieve speed ups of up to 12x over a similar scheme which uses the Linux default PP, and that this works across a set of complex stencil applications. Furthermore, we show that a supposed PP performance optimization in the Linux kernel never improves and in some cases degrades stencil performance by up to 0.27x and should therefore be deactivated by stencil skeleton implementations. Finally, we show that further speed ups of up to 1.1x can be achieved by addressing a work imbalance issue caused by poor conventional understanding of NUMA PP .
Original language | English |
---|---|
Title of host publication | 24th International European Conference on Parallel and Distributed Computing |
Place of Publication | Turin, Italy |
Publisher | Springer |
Pages | 590-602 |
Number of pages | 12 |
ISBN (Electronic) | 978-3-319-96983-1 |
ISBN (Print) | 978-3-319-96982-4 |
DOIs | |
Publication status | Published - 2018 |
Event | 24th International European Conference on Parallel and Distributed Computing - Torino, Italy Duration: 27 Aug 2018 → 31 Aug 2018 https://europar2018.org/ |
Publication series
Name | Lecture Notes in Computer Science (LNCS) |
---|---|
Publisher | Springer, Cham |
Volume | 11014 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 24th International European Conference on Parallel and Distributed Computing |
---|---|
Abbreviated title | Euro-Par 2018 |
Country/Territory | Italy |
City | Torino |
Period | 27/08/18 → 31/08/18 |
Internet address |
Fingerprint
Dive into the research topics of 'NUMA Optimizations for Algorithmic Skeletons'. Together they form a unique fingerprint.Profiles
-
Murray Cole
- School of Informatics - Personal Chair of Patterned Parallel Computing
- Institute for Computing Systems Architecture
- Computer Systems
Person: Academic: Research Active