Dvé: Improving DRAM reliability and performance on-demand via coherent replication

Adarsh Patil, Vijay Nagarajan, Rajeev Balasubramonian, Nicolai Oswald

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

As technologies continue to shrink, memory system failure rates have increased, demanding support for stronger forms of reliability. In this work, we take inspiration from the two-tier approach that decouples correction from detection and explore a novel extrapolation. We propose Dvé, a hardware-driven replication mechanism where data blocks are replicated in 2 different sockets across a cache-coherent NUMA system. Each data block is also accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Such an organization has the advantage of offering two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory, upto and including the memory controller, and (b) higher performance by providing another nearer point of memory access. Dvé realizes both of these benefits via Coherent Replication, a technique that builds on top of existing cache coherence protocols for not only keeping the replicas in sync for reliability, but also to provide coherent access to the replicas during fault-free operation for performance. Dvé can flexibly provide these benefits on-demand by simply using the provisioned memory capacity which, as reported in recent studies, is often underutilized in today's systems. Thus, Dvé introduces a unique design point that offers higher reliability and performance for workloads that do not require the entire memory capacity.

Original languageEnglish
Title of host publication2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture ISCA 2021
PublisherIEEE
Pages526-539
Number of pages14
ISBN (Electronic)9781665433334
DOIs
Publication statusPublished - 4 Aug 2021
Event48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA) - Virtual, Online, Spain
Duration: 14 Jun 202119 Jun 2021
Conference number: 48

Publication series

NameProceedings - International Symposium on Computer Architecture
Volume2021-June
ISSN (Print)1063-6897

Conference

Conference48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA)
Abbreviated titleISCA 2021
Country/TerritorySpain
CityVirtual, Online
Period14/06/2119/06/21

Keywords / Materials (for Non-textual outputs)

  • coherence
  • DRAM
  • memory systems
  • reliability

Fingerprint

Dive into the research topics of 'Dvé: Improving DRAM reliability and performance on-demand via coherent replication'. Together they form a unique fingerprint.

Cite this