Abstract
Data stream processing plays a pivotal role in various web-related applications, including click fraud detection, anomaly identification, and recommendation systems. Accurate and fast detection of items relevant to such tasks within data streams, e.g., heavy hitters, heavy changers, and persistent items, is however non-trivial. This is due to growing streaming speeds, limited fast memory (L1 cache) available in current systems, and highly skewed item distributions encountered in practice. In effect, items of interest that are tracked only based on their features (e.g., item frequency or persistence value) are susceptible to replacement by non-relevant ones, leading to modest detection accuracy, as we reveal. In this work, we introduce the notion of bucket stability, which quantifies the degree of recorded item variation, and show that this is a powerful metric for identifying distinct item types. We propose Stable-Sketch, an elegant and versatile sketch that exploits multidimensional information, including item statistics and bucket stability, and adopts a stochastic approach to drive replacement decisions. We present a theoretical analysis of the error bounds of Stable-Sketch, and conduct extensive experiments to demonstrate that our solution achieves substantially higher accuracy and faster processing speeds than state-of-the-art sketches in a range of item detection tasks, even with tight memories. We further enhance Stable-Sketch's update throughput with Single Instruction Multiple Data (SIMD) instructions and implement our solution with P4, demonstrating real world deployment viability.
Original language | English |
---|---|
Title of host publication | WWW '24 |
Subtitle of host publication | Proceedings of the ACM Web Conference 2024 |
Editors | Roy Ka-Wei Lee |
Place of Publication | New York, NY, United States |
Publisher | Association for Computing Machinery (ACM) |
Pages | 4227-4238 |
Number of pages | 12 |
ISBN (Electronic) | 9798400701719 |
DOIs | |
Publication status | Published - 13 May 2024 |
Event | The ACM Web Conference 2024 - Resorts World Convention Centre, Sentosa, Singapore Duration: 13 May 2024 → 17 May 2024 https://www2024.thewebconf.org |
Conference
Conference | The ACM Web Conference 2024 |
---|---|
Abbreviated title | WWW '24 |
Country/Territory | Singapore |
City | Sentosa |
Period | 13/05/24 → 17/05/24 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- sketch
- data stream
- bucket stability
- heavy items
- persistent items