The Tunnel Effect: Building Data Representations in Deep Neural Networks

Wojciech Masarczyk,Mateusz Ostaszewski,Ehsan Imani,Razvan Pascanu,Piotr Miłoś,Tomasz Trzciński

NeurIPS（2023）

引用 3|浏览35

暂无评分

摘要

Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.

查看译文

关键词

deep neural networks,building data representations,tunnel effect,neural networks

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要