NUMA Optimizations for Algorithmic Skeletons

Paul Metzger, Murray Cole, Christian Fensch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

To address NUMA performance anomalies, programmers often resort to application specific optimizations that are not transferable to other programs, or to generic optimizations that do not perform well in all cases. Skeleton based programming models allow NUMA optimizations to be abstracted on a pattern-by-pattern basis, freeing programmers from this complexity. As a case study, we investigate computations that can be implemented with stencil skeletons. We present an analysis of the behavior of a range of simple and complex stencil programs from the NAS and Rodinia benchmark suites, under state-of-the-art NUMA aware page placement (PP) schemes. We show that even though an application (or skeleton) may have implemented the correct, intuitive scheduling of data and work to threads, the resulting performance can be disrupted by an inappropriate PP scheme. In contrast, we show that a NUMA PP-aware stencil implementation scheme can achieve speed ups of up to 12x over a similar scheme which uses the Linux default PP, and that this works across a set of complex stencil applications. Furthermore, we show that a supposed PP performance optimization in the Linux kernel never improves and in some cases degrades stencil performance by up to 0.27x and should therefore be deactivated by stencil skeleton implementations. Finally, we show that further speed ups of up to 1.1x can be achieved by addressing a work imbalance issue caused by poor conventional understanding of NUMA PP .
Original languageEnglish
Title of host publication24th International European Conference on Parallel and Distributed Computing
Place of PublicationTurin, Italy
PublisherSpringer
Pages590-602
Number of pages12
ISBN (Electronic)978-3-319-96983-1
ISBN (Print)978-3-319-96982-4
DOIs
Publication statusPublished - 2018
Event24th International European Conference on Parallel and Distributed Computing - Torino, Italy
Duration: 27 Aug 201831 Aug 2018
https://europar2018.org/

Publication series

NameLecture Notes in Computer Science (LNCS)
PublisherSpringer, Cham
Volume11014
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International European Conference on Parallel and Distributed Computing
Abbreviated titleEuro-Par 2018
Country/TerritoryItaly
CityTorino
Period27/08/1831/08/18
Internet address

Fingerprint

Dive into the research topics of 'NUMA Optimizations for Algorithmic Skeletons'. Together they form a unique fingerprint.

Cite this