Maximum Coverage in Sublinear Space, Faster

arxiv(2023)

引用 0|浏览14
暂无评分
摘要
Given a collection of $m$ sets from a universe $\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\varepsilon$ approximation using only pairwise independence: $1-\tfrac{4}{c k \log m}$ compared to the original $1-\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.
更多
查看译文
关键词
maximum coverage,sublinear space
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要