Scheduling in mapreduce-like systems for fast completion time

H. Chang, M. Kodialam, R. R. Kompella, T. V. Lakshman, M. Lee, S. Mukherjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large-scale data processing needs of enterprises today are primarily met with distributed and parallel computing in data centers. MapReduce has emerged as an important programming model for these environments. Since today's data centers run many MapReduce jobs in parallel, it is important to find a good scheduling algorithm that can optimize the completion times of these jobs. While several recent papers focused on optimizing the scheduler, there exists very little theoretical understanding of the scheduling problem in the context of MapReduce. In this paper, we seek to address this problem by first presenting a simplified abstraction of the MapReduce scheduling problem, and then formulate the scheduling problem as an optimization problem.We devise various online and offline algorithms to arrive at a good ordering of jobs to minimize the overall job completion times. Since optimal solutions are hard to compute (NP-hard), we propose approximation algorithms that work within a factor of 3 of the optimal. Using simulations, we also compare our online algorithm with standard scheduling strategies such as FIFO, Shortest Job First and show that our algorithm consistently outperforms these across different job distributions.
Original languageEnglish
Title of host publicationINFOCOM, 2011 Proceedings IEEE
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3074-3082
Number of pages9
ISBN (Print)978-1-4244-9919-9
DOIs
Publication statusPublished - 1 Apr 2011

Keywords

  • computational complexity
  • optimisation
  • parallel processing
  • scheduling
  • MapReduce scheduling problem
  • MapReduce-like systems
  • NP-hard
  • approximation algorithm
  • data centers
  • distributed computing
  • fast completion time
  • optimization problem
  • parallel computing
  • programming model
  • scheduling algorithm
  • standard scheduling strategies
  • Approximation algorithms
  • Approximation methods
  • Equations
  • Optimal scheduling
  • Optimized production technology
  • Processor scheduling
  • Program processors

Fingerprint

Dive into the research topics of 'Scheduling in mapreduce-like systems for fast completion time'. Together they form a unique fingerprint.

Cite this