Approximate computing has become a promising mechanism to trade off accuracy for efficiency. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation efficiency. Unfortunately, the state-oftheart systems for approximate computing primarily target batch analytics, where the input data remains unchanged during the course of computation. Thus, they are not well-suited for stream analytics. This motivated the design of STREAMAPPROX— a stream analytics system for approximate computing. To realize this idea, an online stratified reservoir sampling algorithm is designed to produce approximate output with rigorous error bounds. Importantly, the proposed algorithm is generic and can be applied to two prominent types of stream processing systems: (1) batched stream processing such as Apache Spark Streaming, and (2) pipelined stream processing such as Apache Flink.
|Title of host publication||Encyclopedia of Big Data Technologies|
|Editors||Sherif Sakr, Albert Zomaya|
|Number of pages||8|
|ISBN (Print)||3319775243, 978-3319775241, 978-3-319-77526-5|
|Publication status||Published - 1 Mar 2019|