Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol

Antonis Katsarakis, Vasileios Gavrielatos, M.R. Siavash Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, Vijay Nagarajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Today's datacenter applications are underpinned by datastores that are responsible for providing availability, consistency, and performance. For high availability in the presence of failures, these datastores replicate data across several nodes. This is accomplished with the help of a reliable replication protocol that is responsible for maintaining the replicas strongly-consistent even when faults occur. Strong consistency is preferred to weaker consistency models that cannot guarantee an intuitive behavior for the clients. Furthermore, to accommodate high demand at real-time latencies, datastores must deliver high throughput and low latency.This work introduces Hermes, a broadcast-based reliable replication protocol for in-memory datastores that provides both high throughput and low latency by enabling local reads and fully-concurrent fast writes at all replicas. Hermes couples logical timestamps with cache-coherence-inspired invalidations to guarantee linearizability, avoid write serialization at a centralized ordering point, resolve write conflicts locally at each replica (hence ensuring that writes never abort) and provide fault-tolerance via replayable writes. Our implementation of Hermes over an RDMA-enabled reliable datastore with five replicas shows that Hermes consistently achieves higher throughput than state-of-the-art RDMA-based reliable protocols (ZAB and CRAQ) across all write ratios while also significantly reducing tail latency. At 5% writes, the tail latency of Hermes is 3.6X lower than that of CRAQ and ZAB.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
Place of PublicationLausanne, Switzerland
PublisherAssociation for Computing Machinery (ACM)
Pages201-217
Number of pages17
ISBN (Print)9781450371025
DOIs
Publication statusPublished - 9 Mar 2020
Event25th International Conference on Architectural Support for Programming Languages and Operating Systems - Lausanne, Switzerland
Duration: 16 Mar 202020 Mar 2020
Conference number: 25
https://asplos-conference.org/

Publication series

NameASPLOS '20

Conference

Conference25th International Conference on Architectural Support for Programming Languages and Operating Systems
Abbreviated titleASPLOS 2020
CountrySwitzerland
CityLausanne
Period16/03/2020/03/20
Internet address

Keywords

  • replication
  • throughput
  • latency
  • consistency
  • fault-tolerant
  • availability
  • rdma
  • linearizability

Fingerprint

Dive into the research topics of 'Hermes: a Fast, Fault-Tolerant and Linearizable Replication Protocol'. Together they form a unique fingerprint.

Cite this