A Sequential Greedy Approach for Training Implicit Deep Models.

CDC(2022)

引用 0|浏览25
暂无评分
摘要
Recent works in deep learning have demonstrated impressive performance using "implicit deep models," wherein conventional architectures composed of forward-propagating, differentiable parametric layers are replaced by more expressive models composed of an implicitly defined fixed-point equation together with a prediction equation. Methods for training implicit deep models are currently restricted to end-to-end optimization, which relies on solving a matrix-variable fixed-point equation to compute the gradient and an expensive projection step at every iteration. In this work, we extend the idea of greedy layer-wise training, an approach found to yield state-of-the-art performance in conventional deep learning, to a sequential greedy training algorithm for implicit deep models with a strictly upper block triangular structure. We show that such implicit models can be regarded as generalized dense block modules of Dense Convolutional Networks (DenseNets), and thus inherit the underlying parameter efficiency property. For models trained with the Euclidean loss, we develop an alternating minimization subroutine for our sequential optimization algorithm, which consists of alternating between efficiently-solvable least squares problems and single hidden-layer training problems. Furthermore, we theoretically prove that training a non-strictly upper triangular ReLU implicit model is equivalent to training a strictly upper block triangular one, allowing for the application of our algorithm to even more general models. Experiments on smooth and nonsmooth function interpolation, and on MNIST and Fashion-MNIST classification tasks, show that our algorithm consistently converges to models that outperform state-of-the-art end-to-end implicit learning.
更多
查看译文
关键词
implicit deep models,sequential greedy approach,training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要