Main Content

Quantization, Projection, and Pruning

Compress a deep neural network by performing quantization, projection, or pruning

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:

  • Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA® code from this pruned network.

  • Projecting layers by performing principal component analysis (PCA) on the layer activations using a data set representative of the training data and applying linear projections on the layer learnable parameters. Forward passes of a projected deep neural network are typically faster when you deploy the network to embedded hardware using library-free C/C++ code generation.

  • Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA, or HDL code from this quantized network.

    For C/C++ and CUDA code generation, the software generates code for a convolutional deep neural network by quantizing the weights, biases, and activations of the convolution layers to 8-bit scaled integer data types. The quantization is performed by providing the calibration result file produced by the calibrate function to the codegen (MATLAB Coder) command.

    Code generation does not support quantized deep neural networks produced by the quantize function.


espandi tutto

taylorPrunableNetworkNetwork that can be pruned by using first-order Taylor approximation (Da R2022a)
forwardCompute deep learning network output for training (Da R2019b)
predictCompute deep learning network output for inference (Da R2019b)
updatePrunablesRemove filters from prunable layers based on importance scores (Da R2022a)
updateScoreCompute and accumulate Taylor-based importance scores for pruning (Da R2022a)
dlnetworkDeep learning neural network (Da R2019b)
compressNetworkUsingProjectionCompress neural network using projection (Da R2022b)
neuronPCAPrincipal component analysis of neuron activations (Da R2022b)
unpackProjectedLayersUnpack projected layers of neural network (Da R2023b)
ProjectedLayerCompressed neural network layer using projection (Da R2023b)
gruProjectedLayerGated recurrent unit (GRU) projected layer for recurrent neural network (RNN) (Da R2023b)
lstmProjectedLayerLong short-term memory (LSTM) projected layer for recurrent neural network (RNN) (Da R2022b)
dlquantizerQuantize a deep neural network to 8-bit scaled integer data types (Da R2020a)
dlquantizationOptionsOptions for quantizing a trained deep neural network (Da R2020a)
calibrateSimulate and collect ranges of a deep neural network (Da R2020a)
quantizeQuantize deep neural network (Da R2022a)
validateQuantize and validate a deep neural network (Da R2020a)
quantizationDetailsDisplay quantization details for a neural network (Da R2022a)
estimateNetworkMetricsEstimate network metrics for specific layers of a neural network (Da R2022a)
equalizeLayersEqualize layer parameters of deep neural network (Da R2022b)


Deep Network QuantizerQuantize deep neural network to 8-bit scaled integer data types (Da R2020a)



Projection and Knowledge Distillation


Quantization for GPU Target

Quantization for FPGA Target

Quantization for CPU Target