Abstract / Description of output
Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit.
This paper presents Prodigy, a low-cost hardware-software co-design solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a “best of both worlds” approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)—a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm’s data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application’s execution pace.
We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6× and saves energy by 1.6×, on average. Prodigy also out performsmodern data prefetchers by 1.5–2.3×.
This paper presents Prodigy, a low-cost hardware-software co-design solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a “best of both worlds” approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)—a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm’s data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application’s execution pace.
We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6× and saves energy by 1.6×, on average. Prodigy also out performsmodern data prefetchers by 1.5–2.3×.
Original language | English |
---|---|
Title of host publication | 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 654 - 667 |
Number of pages | 14 |
ISBN (Electronic) | 978-1-6654-2235-2 |
ISBN (Print) | 978-1-6654-4670-9 |
DOIs | |
Publication status | Published - 22 Apr 2021 |
Event | The 27th IEEE International Symposium on High-Performance Computer Architecture - Seoul, Korea, Republic of Duration: 27 Feb 2021 → 3 Mar 2021 Conference number: 27 https://hpca-conf.org/2021/ |
Publication series
Name | |
---|---|
Publisher | IEEE |
ISSN (Print) | 1530-0897 |
ISSN (Electronic) | 2378-203X |
Conference
Conference | The 27th IEEE International Symposium on High-Performance Computer Architecture |
---|---|
Abbreviated title | HPCA 2021 |
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 27/02/21 → 3/03/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- DRAM stalls
- irregular workloads
- graph processing
- hardwarre-software co-design
- programming model
- programmer annotations
- compiler
- Hardware Prefetching