Stable-Sketch: A versatile sketch for accurate, fast, Web-scale data stream processing

Weihe Li, Paul Patras

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data stream processing plays a pivotal role in various web-related applications, including click fraud detection, anomaly identification, and recommendation systems. Accurate and fast detection of items relevant to such tasks within data streams, e.g., heavy hitters, heavy changers, and persistent items, is however non-trivial. This is due to growing streaming speeds, limited fast memory (L1 cache) available in current systems, and highly skewed item distributions encountered in practice. In effect, items of interest that are tracked only based on their features (e.g., item frequency or persistence value) are susceptible to replacement by non-relevant ones, leading to modest detection accuracy, as we reveal. In this work, we introduce the notion of bucket stability, which quantifies the degree of recorded item variation, and show that this is a powerful metric for identifying distinct item types. We propose Stable-Sketch, an elegant and versatile sketch that exploits multidimensional information, including item statistics and bucket stability, and adopts a stochastic approach to drive replacement decisions. We present a theoretical analysis of the error bounds of Stable-Sketch, and conduct extensive experiments to demonstrate that our solution achieves substantially higher accuracy and faster processing speeds than state-of-the-art sketches in a range of item detection tasks, even with tight memories. We further enhance Stable-Sketch's update throughput with Single Instruction Multiple Data (SIMD) instructions and implement our solution with P4, demonstrating real world deployment viability.
Original languageEnglish
Title of host publicationWWW '24
Subtitle of host publicationProceedings of the ACM Web Conference 2024
EditorsRoy Ka-Wei Lee
Place of PublicationNew York, NY, United States
PublisherAssociation for Computing Machinery (ACM)
Pages4227-4238
Number of pages12
ISBN (Electronic)9798400701719
DOIs
Publication statusPublished - 13 May 2024
EventThe ACM Web Conference 2024 - Resorts World Convention Centre, Sentosa, Singapore
Duration: 13 May 202417 May 2024
https://www2024.thewebconf.org

Conference

ConferenceThe ACM Web Conference 2024
Abbreviated titleWWW '24
Country/TerritorySingapore
CitySentosa
Period13/05/2417/05/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • sketch
  • data stream
  • bucket stability
  • heavy items
  • persistent items

Fingerprint

Dive into the research topics of 'Stable-Sketch: A versatile sketch for accurate, fast, Web-scale data stream processing'. Together they form a unique fingerprint.

Cite this