Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features

International Conference on Learning Representations (ICLR)(2022)

引用 9|浏览62
暂无评分
摘要
While a popular limit of infinite-width neural networks, the Neural Tangent Kernel (NTK) often exhibits performance gaps from finite-width neural networks on standard datasets, due to lack of feature learning. Although the feature learning *maximal update limit*, or *μ-limit* (Yang and Hu, 2020) of wide networks has closed the gap for 1-hidden-layer linear models, no one has been able to demonstrate this for deep nonlinear multi-layer perceptrons (MLP) because of μ-limit’s computational difficulty in this setting. Here, we solve this problem by proposing a novel feature learning limit, the *π-limit*, that bypasses the computational issues. The π-limit, in short, is the limit of a form of projected gradient descent, and the π-limit of an MLP is roughly another MLP where gradients are appended to weights during training. We prove its almost sure convergence with width using the Tensor Programs technique. We evaluate it on CIFAR10 and Omniglot against NTK as well as finite networks, finding the π-limit outperform finite-width models trained normally (without projection) in both settings, closing the performance gap between finite- and infinite-width neural networks previously left by NTK. Code for this work is available at github.com/santacml/pilim.
更多
查看译文
关键词
infinite-width neural network,feature learning,maximal update parametrization,NTK
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要