Tight-sketch: A high-performance sketch for heavy item-oriented data stream mining with limited memory size

Weihe Li, Paul Patras

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Accurate and fast data stream mining is critical and fundamental to many tasks, including time series database handling, big data management and machine learning. Different heavy-based detection tasks, such as heavy hitter, heavy changer, persistent item and significant item detection, have drawn much attention from both the industry and academia. Unfortunately, due to the growing data stream speeds and limited memory (L1 cache) available for real-time processing, existing schemes face challenges in simultaneously achieving high detection accuracy, high memory efficiency, and fast update throughput, as we reveal. To tackle this conundrum, we propose a versatile and elegant sketch framework named Tight-Sketch, which supports a spectrum of heavy-based detection tasks. Considering that most items are cold (non-heavy/persistent/significant) in practice, we employ different eviction treatments for different types of items to discard these potentially cold ones as soon as possible, and offer more protection to those that are hot (heavy/persistent/significant). In addition, we propose an eviction method that follows a stochastic decay strategy, enabling Tight-Sketch to only bear small one-sided errors (no overestimation). We present a theoretical analysis of the error bounds and conduct extensive experiments on diverse detection tasks to demonstrate that Tight-Sketch significantly outperforms existing methods in terms of accuracy and update speed. Lastly, we accelerate Tight-Sketch’s update throughput by up to 36% with Single Instruction Multiple Data (SIMD) instructions.
Original languageEnglish
Title of host publicationCIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
PublisherACM Association for Computing Machinery
Pages1328–1337
Number of pages10
ISBN (Electronic)9798400701245
DOIs
Publication statusPublished - 21 Oct 2023
Event32nd ACM International Conference on Information and Knowledge Management - Birmingham, United Kingdom
Duration: 21 Oct 202325 Oct 2023
Conference number: 32
https://uobevents.eventsair.com/cikm2023/

Conference

Conference32nd ACM International Conference on Information and Knowledge Management
Abbreviated titleCIKM 2023
Country/TerritoryUnited Kingdom
CityBirmingham
Period21/10/2325/10/23
Internet address

Keywords / Materials (for Non-textual outputs)

  • data stream mining
  • heavy item
  • persistent item
  • significant item
  • sustained arrival strength

Fingerprint

Dive into the research topics of 'Tight-sketch: A high-performance sketch for heavy item-oriented data stream mining with limited memory size'. Together they form a unique fingerprint.

Cite this