Chrome Extension
WeChat Mini Program
Use on ChatGLM

Detecting Duplicates over Sliding Windows with RAM-Efficient Detached Counting Bloom Filter Arrays

Networking, Architecture and Storage(2011)

Cited 7|Views6
No score
Abstract
Detecting duplicates over sliding windows is an important technique for monitoring and analysing data streams. Since recording the exact information of elements in a sliding window can be RAM-resource-intensive and introduce an unacceptable search complexity, several approximate membership representation schemes have been proposed to build in-memory fast indices. However, various challenges facing RAM utilization and scalability remain. This paper proposes a Detached Counting Bloom filter Array (DCBA) to flexibly and efficiently detect duplicates over sliding windows. A DCBA consists of an array of detached counting Bloom filters (DCBFs), where each DCBF is essentially a Bloom filter that is associated with a detached timer (counter) array. The DCBA scheme functions as a circular FIFO queue and keeps a filling DCBF for accommodating fresh elements and a decaying DCBF for evicting stale elements. DCBA allows the timer arrays belonging to fully filled DCBFs to be offloaded to disks to greatly improve the memory space efficiency. The fully filled DCBFs will remain stable until their elements become stale, which allows a DCBA to be efficiently replicated for the purpose of data reliability or information sharing. Further, DCBA can be cooperatively maintained by clustered nodes, which provides scalable solution for mining massive data streams. Mathematical analysis and experimental results show that a DCBA (containing 64 DCBFs) requires less than 10% of its components to be kept in RAM while maintaining more than 95% of its query performance, which significantly outperforms existing schemes in memory efficiency and scalability.
More
Translated text
Key words
massive data stream,exact information,information sharing,bloom filter arrays,detecting duplicates,analysing data stream,bloom filter,ram utilization,sliding windows,decaying dcbf,dcba scheme function,data reliability,memory efficiency,data structure,mathematical analysis,queueing theory,radiation detector,sliding window,data mining,radiation detectors,data structures,data analysis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined