C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches

Cheng-Chieh Huang, Rakesh Kumar, Marco Elver, Boris Grot, Vijayanand Nagarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Massive datasets prevalent in scale-out, enterprise, and high-performance computing are driving a trend toward ever-larger memory capacities per node. To satisfy the memory demands and maximize performance per unit cost, today’s commodity HPC and server nodes tend to feature multi-socket shared memory NUMA organizations. An important problem in these designs is the high latency of accessing memory on a remote socket that results in degraded performance in workloads with large shared data working sets.
This work shows that emerging DRAM caches can help mitigate the NUMA bottleneck by filtering up to 98% of remote memory accesses. To be effective, these DRAM caches must be private to each socket to allow caching of remote memory, which comes with the challenge of ensuring coherence across multiple sockets and GBs of DRAM cache capacity. Moreover, the high access latency of DRAM caches, combined with high inter-socket communication latencies, can make hits to remote DRAM caches slower than main memory accesses. These features challenge existing coherence protocols optimized for on-chip caches with fast hits and modest storage capacity. Our solution to these challenges relies on two insights. First, keeping DRAM caches clean avoids the need to ever access a remote DRAM cache on a read. Second, a non-inclusive on-chip directory that avoids tracking blocks in the DRAM cache enables a light-weight protocol for guaranteeing coherence without the staggering directory costs. Our design, called Clean Coherent DRAM Caches (C3D), leverages these insights to improve performance by 6.4-50.7% in a quad-socket system versus a baseline without DRAM caches.
Original languageEnglish
Title of host publicationMicroarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on
Place of PublicationTaipei, Taiwan
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages12
ISBN (Electronic)978-1-5090-3508-3
ISBN (Print)978-1-5090-3509-0
Publication statusPublished - 15 Dec 2016
Event49th Annual IEEE/ACM International Symposium on Microarchitecture - Taipei, Taiwan, Province of China
Duration: 15 Oct 201619 Oct 2016


Conference49th Annual IEEE/ACM International Symposium on Microarchitecture
Abbreviated titleMICRO-49
CountryTaiwan, Province of China
Internet address


Dive into the research topics of 'C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches'. Together they form a unique fingerprint.

Cite this