# Fast Training of Convolutional Networks through FFTs

international conference on learning representations, 2014.

EI

Keywords:

Wei bo:

Abstract:

Convolutional networks are one of the most widely employed architectures in computer vision and machine learning. In order to leverage their ability to learn complex functions, large amounts of data are required for training. Training a large convolutional network to produce state-of-the-art results can take weeks, even when using moder...More

Code:

Data:

Introduction

- As computer vision and machine learning aim to solve increasingly challenging tasks, models of greater complexity are required.
- While early benchmark datasets in machine learning contained thousands or tens of thousands of samples [7, 3, 10], current datasets are of the order of millions [6, 2]
- This brings about new challenges as to how to train networks in a feasible amount of time.
- There is an important need to develop fast algorithms for training and inference

Highlights

- As computer vision and machine learning aim to solve increasingly challenging tasks, models of greater complexity are required
- While early benchmark datasets in machine learning contained thousands or tens of thousands of samples [7, 3, 10], current datasets are of the order of millions [6, 2]
- We present a simple algorithm which accelerates training and inference using convolutional networks
- We have presented a simple and fast algorithm for training and inference using convolutional networks
- In the future we plan to explore the possibility of learning kernels directly in the Fourier domain. Another interesting direction would be to investigate the use of non-linearities in the Fourier domain rather than in the spatial domain, since this would remove the need for inverse transforms and accelerate training and inference further
- Using input images of size 34 × 34 will be suboptimal in terms of speed since they must be padded to be 64 × 64

Methods

- Figure 2 shows the theoretical number of operations for direct convolution and the FFT method for various input sizes.
- Current GPU implementations of the FFT such as cuFFT are designed to parallelize over individual transforms.
- This can be useful for computing a limited number of transforms on large inputs, but is not suitable for the task since the authors are performing many FFTs over relatively small inputs.
- Note that 2-D FFTs lend themselves naturally to parallelization since they can be decomposed into two sets of 1-D FFTs, and each set can be done in parallel

Conclusion

- The authors have presented a simple and fast algorithm for training and inference using convolutional networks
- It outperforms known state-of-the-art implementations in terms of speed, as verified by numerical experiments.
- Using input images of size 34 × 34 will be suboptimal in terms of speed since they must be padded to be 64 × 64
- This limitation is not intrinsic to the FFT and the authors intend to extend the implementation to accept other sizes in the future.
- In future work the authors intend to thoroughly explore the effect of input image and kernel sizes on performance

Summary

## Introduction:

As computer vision and machine learning aim to solve increasingly challenging tasks, models of greater complexity are required.- While early benchmark datasets in machine learning contained thousands or tens of thousands of samples [7, 3, 10], current datasets are of the order of millions [6, 2]
- This brings about new challenges as to how to train networks in a feasible amount of time.
- There is an important need to develop fast algorithms for training and inference
## Methods:

Figure 2 shows the theoretical number of operations for direct convolution and the FFT method for various input sizes.- Current GPU implementations of the FFT such as cuFFT are designed to parallelize over individual transforms.
- This can be useful for computing a limited number of transforms on large inputs, but is not suitable for the task since the authors are performing many FFTs over relatively small inputs.
- Note that 2-D FFTs lend themselves naturally to parallelization since they can be decomposed into two sets of 1-D FFTs, and each set can be done in parallel
## Conclusion:

The authors have presented a simple and fast algorithm for training and inference using convolutional networks- It outperforms known state-of-the-art implementations in terms of speed, as verified by numerical experiments.
- Using input images of size 34 × 34 will be suboptimal in terms of speed since they must be padded to be 64 × 64
- This limitation is not intrinsic to the FFT and the authors intend to extend the implementation to accept other sizes in the future.
- In future work the authors intend to thoroughly explore the effect of input image and kernel sizes on performance

Reference

- S. Ben-Yacoub, B. Fasel, and J. Luttin. Fast face detection using mlp and fft. In Proceedings of the Second International Conference on Audio and Video-based Biometric Person Authentification (AVBPA 1999), 1999.
- Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011.
- A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of the ACM International Conference on Image and Video Retrieval, 2007.
- Ronan Collobert, Koray Kavukcuoglu, and Clement Farabet. Torch7: A matlab-like environment for machine learning. In NIPS, 2011.
- James Cooley and John Tukey. An algorithm for the machine calculation of complex fourier series. Mathematics of Computation, (19):297–301, 1965.
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. 2009.
- L. Fei-Fei, R. Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. 2004.
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106–1114, 2012.
- Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. In G. Orr and Muller K., editors, Neural Networks: Tricks of the trade. Springer, 1998.
- G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302, July 2002.

Tags

Comments