Reliable Data-center Scale Computations

Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Flavio Junqueira, Benjamin Reed

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Neither of the two broad classes of fault models considered by traditional fault tolerance techniques --- crash and Byzantine faults --- suit the environment of systems that run in today's data centers. On the one hand, assuming Byzantine faults is considered overkill due to the assumption of a worst-case adversarial behavior, and the use of other techniques to guard against malicious attacks. On the other hand, the crash fault model is insufficient since it does not capture non-crash faults that may result from a variety of unexpected conditions that are commonplace in this setting. In this paper, we present the case for a more practical approach at handling non-crash (but non-adversarial) faults in data-center scale computations. In this context, we discuss how such problem can be tackled for an important class of data-center scale systems: systems for large-scale processing of data, with a particular focus on the Pig programming framework. Such an approach not only covers a significant fraction of the processing jobs that run in today's data centers, but is potentially applicable to a broader class of applications.
Original languageEnglish
Title of host publicationProceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Place of PublicationNew York, NY, USA
PublisherACM
Pages1-6
Number of pages6
ISBN (Print)978-1-4503-0406-1
DOIs
Publication statusPublished - Jun 2010

Publication series

NameLADIS '10
PublisherACM

Fingerprint

Dive into the research topics of 'Reliable Data-center Scale Computations'. Together they form a unique fingerprint.

Cite this