Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters

Andreas Chatzistergiou, Stratis D. Viglas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We study provisioning and job reconfiguration techniques for adapting to execution environment changes when processing data streams on cluster-based deployments. By monitoring the performance of an executing job, we identify computation and communication bottlenecks. In such cases we reconfigure the job by reallocating its tasks to minimize the communication cost. Our work targets data-intensive applications where the inter-node transfer latency is significant. We aim to minimize the transfer latency while keeping the nodes below some computational load threshold. We propose a scalable centralized scheme that employs fast allocation heuristics. Our techniques are based on a general group-based job representation that is commonly found in many distributed data stream processing frameworks. Using this representation we devise linear-time task allocation algorithms that improve existing quadratic-time solutions in practical cases. We have implemented and evaluated our proposals using both synthetic and real-world scenarios. Our results show that our algorithms: (a) exhibit significant allocation throughput while producing near-optimal allocations, and (b) significantly improve existing task-level approaches.
Original languageEnglish
Title of host publicationProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2014, Shanghai, China, November 3-7, 2014
Number of pages10
ISBN (Electronic)978-1-4503-2598-1
Publication statusPublished - 3 Nov 2014


Dive into the research topics of 'Fast Heuristics for Near-Optimal Task Allocation in Data Stream Processing over Clusters'. Together they form a unique fingerprint.

Cite this