Abstract / Description of output
As technologies continue to shrink, memory system failure rates have increased, demanding support for stronger forms of reliability. In this work, we take inspiration from the two-tier approach that decouples correction from detection and explore a novel extrapolation. We propose Dvé, a hardware-driven replication mechanism where data blocks are replicated in 2 different sockets across a cache-coherent NUMA system. Each data block is also accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Such an organization has the advantage of offering two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory, upto and including the memory controller, and (b) higher performance by providing another nearer point of memory access. Dvé realizes both of these benefits via Coherent Replication, a technique that builds on top of existing cache coherence protocols for not only keeping the replicas in sync for reliability, but also to provide coherent access to the replicas during fault-free operation for performance. Dvé can flexibly provide these benefits on-demand by simply using the provisioned memory capacity which, as reported in recent studies, is often underutilized in today's systems. Thus, Dvé introduces a unique design point that offers higher reliability and performance for workloads that do not require the entire memory capacity.
Original language | English |
---|---|
Title of host publication | 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture ISCA 2021 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 526-539 |
Number of pages | 14 |
ISBN (Electronic) | 9781665433334 |
DOIs | |
Publication status | Published - 4 Aug 2021 |
Event | 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA) - Virtual, Online, Spain Duration: 14 Jun 2021 → 19 Jun 2021 Conference number: 48 |
Publication series
Name | Proceedings - International Symposium on Computer Architecture |
---|---|
Volume | 2021-June |
ISSN (Print) | 1063-6897 |
Conference
Conference | 48th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA) |
---|---|
Abbreviated title | ISCA 2021 |
Country/Territory | Spain |
City | Virtual, Online |
Period | 14/06/21 → 19/06/21 |
Keywords / Materials (for Non-textual outputs)
- coherence
- DRAM
- memory systems
- reliability