Projects per year
Abstract
GPUs employ a high degree of thread-level parallelism (TLP) to hide the long latency of memory operations. However, the consequent increase in demand on the memory system causes pathological effects such as cache thrashing and bandwidth bottlenecks. As a result, high degrees of TLP can adversely affect system throughput. In this paper, we present Poise, a novel approach for balancing TLP and memory system performance in GPUs. Poise has two major components: a machine learning framework and a hardware inference engine. The machine learning framework comprises a regression model that is trained offline on a set of profiled kernels to learn best warp scheduling decisions. At runtime, the hardware inference engine uses the previously learned model to dynamically predict best warp scheduling decisions for unseen applications. Therefore, Poise helps in optimizing entirely new applications without posing any profiling, training or programming burden on the end-user. Across a set of benchmarks that were unseen during training, Poise achieves a speedup of up to 2.94× and a harmonic mean speedup of 46.6%, over the baseline greedy-then-oldest warp scheduler. Poise is extremely lightweight and incurs a minimal hardware overhead of around 41 bytes per SM. It also reduces the overall energy consumption by an average of 51.6%. Furthermore, Poise outperforms the prior state-of-the-art warp scheduler by an average of 15.1%. In effect, Poise solves a complex hardware optimization problem with considerable accuracy and efficiency.
Original language | English |
---|---|
Title of host publication | 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) |
Place of Publication | Washington, DC, USA |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 492-505 |
Number of pages | 14 |
ISBN (Electronic) | 978-1-7281-1444-6 |
ISBN (Print) | 978-1-7281-1445-3 |
DOIs | |
Publication status | Published - 28 Mar 2019 |
Event | 25th IEEE International Symposium on High-Performance Computer Architecture - Washington D.C., United States Duration: 16 Feb 2019 → 20 Feb 2019 http://hpca2019.seas.gwu.edu/ |
Publication series
Name | |
---|---|
Publisher | IEEE |
ISSN (Print) | 1530-0897 |
ISSN (Electronic) | 2378-203X |
Conference
Conference | 25th IEEE International Symposium on High-Performance Computer Architecture |
---|---|
Abbreviated title | HPCA 2019 |
Country/Territory | United States |
City | Washington D.C. |
Period | 16/02/19 → 20/02/19 |
Internet address |
Fingerprint
Dive into the research topics of 'Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning'. Together they form a unique fingerprint.Projects
- 1 Finished
-
C3- Scalable & Verified Shared Memory via Consistency-directed Cache Coherence
Nagarajan, V., Jackson, P. & Topham, N.
9/11/15 → 30/04/19
Project: Research
Profiles
-
Nigel Topham
- School of Informatics - Chair of Computer Systems
- Institute for Computing Systems Architecture
- Computer Systems
Person: Academic: Research Active