Characterizing memory bottlenecks in GPGPU workloads

S. Dublish, V. Nagarajan, N. Topham

Research output: Chapter in Book/Report/Conference proceedingConference contribution


GPUs are often limited by the off-chip memory bandwidth. With the advent of general-purpose computing on GPUs, cache hierarchy has been introduced to filter the bandwidth demand to the off-chip memory. However, the cache hierarchy presents its own bandwidth limitations in sustaining such high levels of memory traffic. In this work, we characterize the bandwidth bottleneck in GPUs present across the memory hierarchy for general-purpose applications. We show that the improvement in performance achieved by mitigating the bandwidth bottleneck in the cache hierarchy can exceed the speedup obtained by a memory system with a baseline cache hierarchy and high bandwidth off-chip memory. We also show that addressing the bandwidth bottleneck in isolation at specific levels can be sub-optimal and can even be counter-productive. Therefore, we show that it is imperative to resolve the bandwidth bottleneck synergistically across different levels of the memory hierarchy.
Original languageEnglish
Title of host publication2016 IEEE International Symposium on Workload Characterization (IISWC)
Place of PublicationProvidence, RI, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages2
ISBN (Electronic)978-1-5090-3896-1
ISBN (Print)978-1-5090-3897-8
Publication statusPublished - 10 Oct 2016
Event2016 IEEE International Symposium on Workload Characterization - Providence, United States
Duration: 25 Sep 201627 Sep 2016


Conference2016 IEEE International Symposium on Workload Characterization
Abbreviated titleIISWC 2016
Country/TerritoryUnited States
Internet address


Dive into the research topics of 'Characterizing memory bottlenecks in GPGPU workloads'. Together they form a unique fingerprint.

Cite this