An exploration and analysis of multistage convolutional architectures for object recognition

An exploration and analysis of multistage convolutional architectures for object recognition(2013)

引用 23|浏览70
暂无评分
摘要
In this thesis we study various architectures and training procedures to learn sparse convolutional feature hierarchies. We start with a modified form of classical sparse coding that learns a feed-forward function to predict the optimal sparse codes. This function can then be integrated into a globally trained multistage architecture. We experimented with various architectures and show that rectification and local contrast normalization non-linearities are the most important ingredients for good accuracy on object recognition benchmarks. We also show that two stages of feature extraction yield better accuracy than one. This method learns first level filters that seem fairly generic and task independent (oriented Gabors). It is unclear whether the mid-level features would not produce better results by being pre-trained in a somewhat more task-specific manner. To investigate this, we augment the original sparse coding objective function to include a discriminative term and learn a feed-forward function to approximate the sparse and discriminative codes. This task-oriented sparse coding method is combined with a new multi-scale feature pooling module to greatly increase the system's performance on several object recognition benchmarks. These training procedures require two distinct phases of training: unsupervised pre-training followed by supervised fine-tuning. To simplify the training process we introduce an integrated single phase supervised learning procedure that places an L1 penalty on the output state of each layer of the network. This forces the network to produce sparse and discriminative codes without the expensive pre-training phase. This network includes a new pooling method that enforces a kind of purely supervised group sparsity that promotes similarity between filters within a defined neighborhood. This creates a local invariance to small perturbations increasing the robustness of the features. Finally we study the effects of including contrast normalization on the network's internal representation and show that including the nonlinearity preserves more information about the input in the output feature maps leading to better discriminability among object categories.
更多
查看译文
关键词
original sparse,classical sparse coding,task-oriented sparse,optimal sparse code,multistage convolutional architecture,various architecture,feed-forward function,discriminative code,training procedure,object recognition benchmarks,sparse convolutional feature hierarchy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要