Reliable Data-center Scale Computations

Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Flavio Junqueira, Benjamin Reed

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Neither of the two broad classes of fault models considered by traditional fault tolerance techniques --- crash and Byzantine faults --- suit the environment of systems that run in today's data centers. On the one hand, assuming Byzantine faults is considered overkill due to the assumption of a worst-case adversarial behavior, and the use of other techniques to guard against malicious attacks. On the other hand, the crash fault model is insufficient since it does not capture non-crash faults that may result from a variety of unexpected conditions that are commonplace in this setting. In this paper, we present the case for a more practical approach at handling non-crash (but non-adversarial) faults in data-center scale computations. In this context, we discuss how such problem can be tackled for an important class of data-center scale systems: systems for large-scale processing of data, with a particular focus on the Pig programming framework. Such an approach not only covers a significant fraction of the processing jobs that run in today's data centers, but is potentially applicable to a broader class of applications.
Original languageEnglish
Title of host publicationProceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Place of PublicationNew York, NY, USA
Number of pages6
ISBN (Print)978-1-4503-0406-1
Publication statusPublished - Jun 2010

Publication series

NameLADIS '10


Dive into the research topics of 'Reliable Data-center Scale Computations'. Together they form a unique fingerprint.

Cite this