Evaluating and optimising compiler code generation for NVIDIA Grace

Ricardo Jesus, Michele Weiland

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract / Description of output

In this paper, we explore the performance of the main optimising compiler toolchains currently available for high-performance AArch64 processors, namely the Arm Compiler for Linux (ACFL), GNU, LLVM and the NVIDIA HPC (NVHPC) compilers, on the recently released NVIDIA Grace CPU. We evaluate the performance of these compilers using the RAJA Performance Suite (RAJAPerf) to understand where each compiler does best and why. We find that compilers mostly generate well optimised code on baseline sequential runs, with the gap between the fastest and slowest being only 8% on average. However, they exhibit much larger variations on threaded parallel runs—with the gap between fastest and slowest code generated by the different compilers increasing to roughly 33%. Furthermore, we investigate in detail those kernels where LLVM performs worst relative to the remaining compilers and propose optimisations to improve code generation in those cases. We show scenarios where the default compiler behaviour produces sub-optimal code and where adjusting compiler flags, such as those explicitly controlling loop unrolling, can improve performance significantly. In cases where this is insufficient, we propose changes at the compiler level necessary to enable improved code generation and unlock further optimisations. These improvements account for speedups of over 70% in some kernels.
Original languageEnglish
Title of host publicationICPP '24
Subtitle of host publicationProceedings of the 53rd International Conference on Parallel Processing
PublisherAssociation for Computing Machinery (ACM)
Pages691-700
Number of pages10
ISBN (Electronic)9798400717932
DOIs
Publication statusPublished - 12 Aug 2024
Event53rd International Conference on Parallel Processing - Gotland, Sweden
Duration: 12 Aug 202415 Aug 2024
https://icpp2024.org/

Conference

Conference53rd International Conference on Parallel Processing
Abbreviated titleICPP 2024
Country/TerritorySweden
CityGotland
Period12/08/2415/08/24
Internet address

Fingerprint

Dive into the research topics of 'Evaluating and optimising compiler code generation for NVIDIA Grace'. Together they form a unique fingerprint.

Cite this