Fast Frequent Patterns Mining by Multiple Sampling With Tight Guarantee Under Bayesian Statistics

IEEE transactions on cybernetics(2023)

引用 0|浏览49
暂无评分
摘要
Sampling from large dataset is commonly used in the frequent patterns (FPs) mining. To tightly and theoretically guarantee the quality of the FPs obtained from samples, current methods theoretically stabilize the supports of all the patterns in random samples, despite only FPs do matter, so they always overestimate the sample size. We propose an algorithm called multiple sampling-based FPs mining (MSFP). The MSFP first generates the set of approximate frequent items (AFI), and uses the AFI to form the set of approximate FPs without supports ( AFP*), where it does not stabilize the value of any item's or pattern's support, but only stabilizes the relationship >= or < between the support and the minimum support, so the MSFP can use small samples to successively obtain the AFI and AFP*, and can successively prune the patterns not contained by the AFI and not in the AFP*. Then, the MSFP introduces the Bayesian statistics to only stabilize the values of supports of AFP*'s patterns. If a pattern's support in the original dataset is unknown, the MSFP regards it as random, and keeps updating its distribution by its approximations obtained from the samples taken in the progressive sampling, so the error probability can be bound better. Furthermore, to reduce the I/O processes in the progressive sampling, the MSFP stores a large enough random sample in memory in advance. The experiments show that the MSFP is reliable and efficient.
更多
查看译文
关键词
Bayes methods,Approximation algorithms,Upper bound,Tools,Indexes,Cybernetics,Task analysis,Bayesian statistics,frequent pattern (FP),sampling,tight guarantee
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要