Abstract
Sliding-window computations are widely used for large-scale data analysis, particularly in live systems where new data arrives continuously. These computations consume significant computational resources because they usually recompute over the full window of data every time the window slides. In this chapter, we propose techniques for improving the scalability of sliding-window computations by performing them incrementally. In our approach, when some new data is added at the end of the window or old data dropped from its beginning, the output is updated automatically and efficiently by reusing previously run sub-computations.
The key idea behind our approach is to organize the sub-computations as a shallow (logarithmic depth) balanced tree and perform incremental updates by propagating changes through this tree. This approach is motivated and inspired by advances on self-adjusting computation, which enables automatic and efficient incremental computation. We present an Hadoop based implementation that also provides a dataflow query processing interface.We evaluate it with a variety of applications and real-world case studies. Our results show significant performance improvements for large-scale sliding-window computations without any modifications to the existing data analysis code.
The key idea behind our approach is to organize the sub-computations as a shallow (logarithmic depth) balanced tree and perform incremental updates by propagating changes through this tree. This approach is motivated and inspired by advances on self-adjusting computation, which enables automatic and efficient incremental computation. We present an Hadoop based implementation that also provides a dataflow query processing interface.We evaluate it with a variety of applications and real-world case studies. Our results show significant performance improvements for large-scale sliding-window computations without any modifications to the existing data analysis code.
Original language | English |
---|---|
Title of host publication | Encyclopedia of Big Data Technologies |
Editors | Sherif Sakr, Albert Zomaya |
Publisher | Springer-Verlag |
Chapter | I |
Pages | 1007-1015 |
Number of pages | 9 |
Edition | 1 |
ISBN (Electronic) | 978-3-319-77525-8 |
ISBN (Print) | 3319775243, 978-3319775241, 978-3-319-77526-5 |
DOIs | |
Publication status | Published - 1 Mar 2019 |