Self-Growing Binary Activation Network: A Novel Deep Learning Model With Dynamic Architecture

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS(2024)

引用 2|浏览0
暂无评分
摘要
For a deep learning model, the network architecture is crucial as a model with inappropriate architecture often suffers from performance degradation or parameter redundancy. However, it is experiential and difficult to find the appropriate architecture for a certain application. To tackle this problem, we propose a novel deep learning model with dynamic architecture, named self-growing binary activation network (SGBAN), which can extend the design of a fully connected network (FCN) progressively, resulting in a more compact architecture with higher performance on a certain task. This constructing process is more efficient than neural architecture search methods that train mass of networks to search for the optimal one. Concretely, the training technique of SGBAN is based on the function-preserving transformations that can expand the architecture and combine the information in the new data without neglecting the knowledge learned in the previous steps. The experimental results on four different classification tasks, i.e., Iris, MNIST, CIFAR-10, and CIFAR-100, demonstrate the effectiveness of SGBAN. On the one hand, SGBAN achieves competitive accuracy when compared with the FCN composed of the same architecture, which indicates that the new training technique has the equivalent optimization ability as the traditional optimization methods. On the other hand, the architecture generated by SGBAN achieves 0.59% improvements of accuracy, with only 33.44% parameters when compared with the FCNs composed of manual design architectures, i.e., 500 + 150 hidden units, on MNIST. Furthermore, we demonstrate that replacing the fully connected layers of the well-trained VGG-19 with SGBAN can gain a slightly improved performance with less than 1% parameters on all these tasks. Finally, we show that the proposed method can conduct the incremental learning tasks and outperform the three outstanding incremental learning methods, i.e., learning without forgetting, elastic weight consolidation, and gradient episodic memory, on both the incremental learning tasks on Disjoint MNIST and Disjoint CIFAR-10.
更多
查看译文
关键词
Training,Computer architecture,Neurons,Task analysis,Deep learning,Mathematical models,Data models,Binary activation function,function-preserving transformation,incremental learning,network compression,neural architecture search (NAS)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要