D-Cube: Dense-Block Detection In Terabyte-Scale Tensors

WSDM(2017)

引用 72|浏览116
暂无评分
摘要
How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense blocks in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been used for rapid and accurate dense-block detection in tensors. However, all such methods have low accuracy, or assume that tensors are small enough to fit in main memory, which is not true in many real-world applications such as social media and web.To overcome these limitations, we propose D-CuBE, a disk-based dense-block detection method, which also can be run in a distributed manner across multiple machines. Compared with state-of-the-art methods, D-CuBE is (1) Memory Efficient: requires up to 1,600x less memory and handles 1,000x larger data (2.6TB), (2) Fast: up to 5x faster due to its near-linear scalability with all aspects of data, (3) Provably Accurate: gives a guarantee on the densities of the blocks it finds, and (4) Effective: successfully spotted network attacks from TCP dumps and synchronized behavior in rating data with the highest accuracy.
更多
查看译文
关键词
Dense-Block Detection,Anomaly/Fraud Detection,Tensor
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要