Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
international conference on learning representations, 2015.
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requ...More
PPT (Upload PPT)