Finding Correlations in Subquadratic Time, with Applications to Learning Parities and the Closest Pair Problem

Journal of the ACM(2015)

引用 89|浏览85
暂无评分
摘要
Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson correlation coefficient ρ (Hamming distance dċ 1−ρ&frac;2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constant &eps;>0, and constant ρ>0, runs in expected time O(n5&mins;ω&frac;4&mins;ω+&eps; +nd) < O(n1.62 +nd), where ω < 2.4 is the exponent of matrix multiplication. This is the first subquadratic--time algorithm for this problem for which ρ does not appear in the exponent of n, and improves upon O(n2&mins;O(ρ)), given by Paturi et al. [1989], the Locality Sensitive Hashing approach of Motwani [1998] and the Bucketing Codes approach of Dubiner [2008]. Applications and extensions of this basic algorithm yield significantly improved algorithms for several other problems. Approximate Closest Pair. For any sufficiently small constant &eps;>0, given n d-dimensional vectors, there exists an algorithm that returns a pair of vectors whose Euclidean (or Hamming) distance differs from that of the closest pair by a factor of at most 1+&eps;, and runs in time O(n2&mins;Θ(&sqrt;&eps;)). The best previous algorithms (including Locality Sensitive Hashing) have runtime O(n2&mins;O(&eps;)). Learning Sparse Parities with Noise. Given samples from an instance of the learning parities with noise problem where each example has length n, the true parity set has size at most k « n, and the noise rate is η, there exists an algorithm that identifies the set of k indices in time nω+&eps;&frac;3 k poly(1&frac;1&mins;2η) < n0.8k poly(1&frac;1&mins;2 η). This is the first algorithm with no dependence on η in the exponent of n, aside from the trivial O&big;(n &choose; k)&big; ≈ O(nk) brute-force algorithm, and for large noise rates (η > 0.4), improves upon the results of Grigorescu et al. [2011] that give a runtime of n(1+(2 η)2 + o(1))k&frac;2 poly(1&frac;1&mins;2η). Learning k-Juntas with Noise. Given uniformly random length n Boolean vectors, together with a label, which is some function of just k « n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [2009], our result for learning k-parities implies an algorithm for this problem with runtime nω+&eps;&frac;3 k poly(1&frac;1&mins;2η) < n0.8k poly(1&frac;1&mins;2 η), which is the first runtime for this problem of the form nck with an absolute constant c < 1. Learning k-Juntas without Noise. Given uniformly random length n Boolean vectors, together with a label, which is some function of k « n of the bits, return the set of relevant indices. Using a modification of the algorithm of Mossel et al. [2004], and employing our algorithm for learning sparse parities with noise via the reduction of Feldman et al. [2009], we obtain an algorithm for this problem with runtime nω+ &eps;&frac;4 k poly(n) < n0.6k poly(n), which improves on the previous best of nω+1&frac;ωk ≈ n0.7k poly(n) of Mossel et al. [2004].
更多
查看译文
关键词
Design,Algorithms,Performance,Correlations,nearest neighbor,approximate closest pair,locality sensitive hashing,parity with noise,learning juntas,metric embedding,asymmetric embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要