Abstract / Description of output
In large data streams consisting of sequences of data items, those appearing over a long period of time are regarded as persistent. Compared with frequent items, persistent items do not necessarily hold large amounts of data and thus may hamper the effectiveness of vanilla volume-based detectors. Identifying persistent items plays a crucial role in a range of areas such as fraud detection and network management. Fast detection of persistent items in massive streams is however challenging due to the inherently high data rates, while state-of-the-art persistent item lookup solutions routinely require large enough memory to attain high accuracy, which questions the feasibility of deploying them in practice. In this paper, we introduce P-Sketch, a novel approach to persistent item lookup that achieves high accuracy even with small memory (L1 Cache) budgets and maintains high update speed across different settings. Specifically, we introduce the concept of arrival continuity (hotness) that counts the number of consecutive windows in which an item appears, to effectively protect persistent items from being wrongly replaced by non-persistent ones. Through meticulous data analysis, we also reveal that items with higher persistence tend to possess a stronger hotness than non-persistent ones. Thus, we harness the information of persistence and hotness, and employ a probability-based replacement strategy to achieve a good balance between memory efficiency, lookup accuracy, and update speed. We also present a theoretical analysis of the performance of the proposed P-Sketch. Through trace-driven emulations, we demonstrate that our P-Sketch yields average F1 score and update throughput gains of up to 10.32 $\times$ and respectively 2.9 $\times$ , over existing schemes. Lastly, we show how to further boost the P-Sketch’s update speed with Single Instruction Multiple Data (SIMD) instructions.
Original language | English |
---|---|
Article number | 10230989 |
Pages (from-to) | 987-1002 |
Number of pages | 16 |
Journal | IEEE/ACM Transactions on Networking |
Volume | 32 |
Issue number | 2 |
DOIs | |
Publication status | Published - 25 Aug 2023 |
Keywords / Materials (for Non-textual outputs)
- memory management
- codes
- throughput
- IEEE transactions
- fraud
- decoding
- real-time systems