Abstract / Description of output
Accurate and fast data stream mining is critical and fundamental to many tasks, including time series database handling, big data management and machine learning. Different heavy-based detection tasks, such as heavy hitter, heavy changer, persistent item and significant item detection, have drawn much attention from both the industry and academia. Unfortunately, due to the growing data stream speeds and limited memory (L1 cache) available for real-time processing, existing schemes face challenges in simultaneously achieving high detection accuracy, high memory efficiency, and fast update throughput, as we reveal. To tackle this conundrum, we propose a versatile and elegant sketch framework named Tight-Sketch, which supports a spectrum of heavy-based detection tasks. Considering that most items are cold (non-heavy/persistent/significant) in practice, we employ different eviction treatments for different types of items to discard these potentially cold ones as soon as possible, and offer more protection to those that are hot (heavy/persistent/significant). In addition, we propose an eviction method that follows a stochastic decay strategy, enabling Tight-Sketch to only bear small one-sided errors (no overestimation). We present a theoretical analysis of the error bounds and conduct extensive experiments on diverse detection tasks to demonstrate that Tight-Sketch significantly outperforms existing methods in terms of accuracy and update speed. Lastly, we accelerate Tight-Sketch’s update throughput by up to 36% with Single Instruction Multiple Data (SIMD) instructions.
Original language | English |
---|---|
Title of host publication | CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management |
Publisher | ACM Association for Computing Machinery |
Pages | 1328–1337 |
Number of pages | 10 |
ISBN (Electronic) | 9798400701245 |
DOIs | |
Publication status | Published - 21 Oct 2023 |
Event | 32nd ACM International Conference on Information and Knowledge Management - Birmingham, United Kingdom Duration: 21 Oct 2023 → 25 Oct 2023 Conference number: 32 https://uobevents.eventsair.com/cikm2023/ |
Conference
Conference | 32nd ACM International Conference on Information and Knowledge Management |
---|---|
Abbreviated title | CIKM 2023 |
Country/Territory | United Kingdom |
City | Birmingham |
Period | 21/10/23 → 25/10/23 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- data stream mining
- heavy item
- persistent item
- significant item
- sustained arrival strength