Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

GPUs employ a high degree of thread-level parallelism (TLP) to hide the long latency of memory operations. However, the consequent increase in demand on the memory system causes pathological effects such as cache thrashing and bandwidth bottlenecks. As a result, high degrees of TLP can adversely affect system throughput. In this paper, we present Poise, a novel approach for balancing TLP and memory system performance in GPUs. Poise has two major components: a machine learning framework and a hardware inference engine. The machine learning framework comprises a regression model that is trained offline on a set of profiled kernels to learn best warp scheduling decisions. At runtime, the hardware inference engine uses the previously learned model to dynamically predict best warp scheduling decisions for unseen applications. Therefore, Poise helps in optimizing entirely new applications without posing any profiling, training or programming burden on the end-user. Across a set of benchmarks that were unseen during training, Poise achieves a speedup of up to 2.94× and a harmonic mean speedup of 46.6%, over the baseline greedy-then-oldest warp scheduler. Poise is extremely lightweight and incurs a minimal hardware  overhead of around 41 bytes per SM. It also reduces the overall energy consumption by an average of 51.6%. Furthermore, Poise outperforms the prior state-of-the-art warp scheduler by an average of 15.1%. In effect, Poise solves a complex hardware optimization problem with considerable accuracy and efficiency.
Original languageEnglish
Title of host publication2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Place of PublicationWashington, DC, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages492-505
Number of pages14
ISBN (Electronic)978-1-7281-1444-6
ISBN (Print)978-1-7281-1445-3
DOIs
Publication statusPublished - 28 Mar 2019
Event25th IEEE International Symposium on High-Performance Computer Architecture - Washington D.C., United States
Duration: 16 Feb 201920 Feb 2019
http://hpca2019.seas.gwu.edu/

Publication series

Name
PublisherIEEE
ISSN (Print)1530-0897
ISSN (Electronic)2378-203X

Conference

Conference25th IEEE International Symposium on High-Performance Computer Architecture
Abbreviated titleHPCA 2019
Country/TerritoryUnited States
CityWashington D.C.
Period16/02/1920/02/19
Internet address

Fingerprint

Dive into the research topics of 'Poise: Balancing Thread-Level Parallelism and Memory System Performance in GPUs using Machine Learning'. Together they form a unique fingerprint.

Cite this