P-Sketch: A Fast and Accurate Sketch for Persistent Item Lookup

Weihe Li, Paul Patras

Research output: Contribution to journalArticlepeer-review

Abstract / Description of output

In large data streams consisting of sequences of data items, those appearing over a long period of time are regarded as persistent. Compared with frequent items, persistent items do not necessarily hold large amounts of data and thus may hamper the effectiveness of vanilla volume-based detectors. Identifying persistent items plays a crucial role in a range of areas such as fraud detection and network management. Fast detection of persistent items in massive streams is however challenging due to the inherently high data rates, while state-of-the-art persistent item lookup solutions routinely require large enough memory to attain high accuracy, which questions the feasibility of deploying them in practice. In this paper, we introduce P-Sketch, a novel approach to persistent item lookup that achieves high accuracy even with small memory (L1 Cache) budgets and maintains high update speed across different settings. Specifically, we introduce the concept of arrival continuity (hotness) that counts the number of consecutive windows in which an item appears, to effectively protect persistent items from being wrongly replaced by non-persistent ones. Through meticulous data analysis, we also reveal that items with higher persistence tend to possess a stronger hotness than non-persistent ones. Thus, we harness the information of persistence and hotness, and employ a probability-based replacement strategy to achieve a good balance between memory efficiency, lookup accuracy, and update speed. We also present a theoretical analysis of the performance of the proposed P-Sketch. Through trace-driven emulations, we demonstrate that our P-Sketch yields average F1 score and update throughput gains of up to 10.32 $\times$ and respectively 2.9 $\times$ , over existing schemes. Lastly, we show how to further boost the P-Sketch’s update speed with Single Instruction Multiple Data (SIMD) instructions.
Original languageEnglish
Article number10230989
Pages (from-to)987-1002
Number of pages16
JournalIEEE/ACM Transactions on Networking
Volume32
Issue number2
DOIs
Publication statusPublished - 25 Aug 2023

Keywords / Materials (for Non-textual outputs)

  • memory management
  • codes
  • throughput
  • IEEE transactions
  • fraud
  • decoding
  • real-time systems

Fingerprint

Dive into the research topics of 'P-Sketch: A Fast and Accurate Sketch for Persistent Item Lookup'. Together they form a unique fingerprint.

Cite this