Generalization error of spectral algorithms
ICLR 2024(2024)
摘要
The asymptotically precise estimation of the generalization of kernel methods
has recently received attention due to the parallels between neural networks
and their associated kernels. However, prior works derive such estimates for
training by kernel ridge regression (KRR), whereas neural networks are
typically trained with gradient descent (GD). In the present work, we consider
the training of kernels with a family of spectral algorithms
specified by profile h(λ), and including KRR and GD as special cases.
Then, we derive the generalization error as a functional of learning profile
h(λ) for two data models: high-dimensional Gaussian and low-dimensional
translation-invariant model. Under power-law assumptions on the spectrum of the
kernel and target, we use our framework to (i) give full loss asymptotics for
both noisy and noiseless observations (ii) show that the loss localizes on
certain spectral scales, giving a new perspective on the KRR saturation
phenomenon (iii) conjecture, and demonstrate for the considered data models,
the universality of the loss w.r.t. non-spectral details of the problem, but
only in case of noisy observation.
更多查看译文
关键词
gradient descent,kernel ridge regression,optimal algorithm,generalization,asymptotic error rates,power-laws
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要