Main Content

Quantize Network for FPGA Deployment

This example shows how to quantize learnable parameters in the convolution layers of a neural network, and validate the quantized network. Rapidly prototype the quantized network by using MATLAB simulation or an FPGA to validate the quantized network. In this example, you quantize the LogoNet neural network.

For this example, you need:

  • Deep Learning Toolbox ™

  • Deep Learning HDL Toolbox ™

  • Deep Learning Toolbox Model Quantization Library

  • Deep Learning HDL Toolbox Support Package for Xlinx FPGA and SoC Devices

  • MATLAB Coder Interface for Deep Learning Libraries.

Load Pretrained Network

To load the pretrained LogoNet network and analyze the network architecture, enter:

snet = getLogoNetwork;
analyzeNetwork(snet);

Define Calibration and Validation Data Sets

This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore object to use for calibration and validation. Expedite calibration and validation by reducing the calibration data set to 20 images. The MATLAB simulation workflow has a maximum limit of five images when validating the quantized network. Reduce the validation data set sizes to five images. The FPGA validation workflow has a maximum limit of one image when validating the quantized network. Reduce the FPGA validation data set to a single image.

curDir = pwd;
newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip');
copyfile(newDir,curDir,'f');
unzip('logos_dataset.zip');
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
calibrationData_reduced = calibrationData.subset(1:20);
validationData_simulation = validationData.subset(1:5);
validationData_FPGA = validationData.subset(1:1);

Create Quantized Network Object

Create a dlquantizer object and specify the network to quantize. Specify the execution environment as FPGA for one object. Run the MATLAB simulation on for the second dlquantizer object.

dlQuantObj_simulation = dlquantizer(snet,'ExecutionEnvironment',"FPGA",'Simulation','on');
dlQuantObj_FPGA = dlquantizer(snet,'ExecutionEnvironment',"FPGA");

Calibrate Quantized Network

Use the calibrate function to exercise the network with sample inputs and collect the range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

dlQuantObj_simulation.calibrate(calibrationData_reduced)
ans=35×5 table
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        198.11
      ⋮

dlQuantObj_FPGA.calibrate(calibrationData_reduced)
ans=35×5 table
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        198.11
      ⋮

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx™ Vivado™ Design Suite 2020.1. To set the Xilinx Vivado toolpath, enter:

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.1\bin\vivado.bat');

To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Alternatively, you can also use the JTAG interface.

% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');

Create dlQuantizationOptions Object

Create a dlquantizationOptions object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.

options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget);
options_simulation = dlquantizationOptions;

To use a custom metric function, specify the metric function in the dlquantizationOptions object.

% options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x, snet, validationData_FPGA)},'Bitstream','zcu102_int8','Target',hTarget);
% options_simulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x, snet,validationData_simulation)})

Validate Quantized Network

Use the validate function to quantize the learnable parameters in the convolution layers of the network. The validate function simulates the quantized network in MATLAB. The validate function uses the metric function defined in the dlquantizationOptions object to compare the results of the single data type network object to the results of the quantized network object.

prediction_simulation = dlQuantObj_simulation.validate(validationData_simulation,options_simulation)
### Notice: (Layer  1) The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: (Layer  2) The layer 'out_imageinput' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Compiling leg: conv_1>>maxpool_4 ...
### Notice: (Layer  1) The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: (Layer 14) The layer 'output' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Compiling leg: conv_1>>maxpool_4 ... complete.
Compiling leg: fc_1>>fc_3 ...
### Notice: (Layer  1) The layer 'maxpool_4' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: (Layer  7) The layer 'output' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
Compiling leg: fc_1>>fc_3 ... complete.
### Should not enter here. It means a component is unaccounted for in MATLAB Emulation.
### Notice: (Layer  1) The layer 'fc_3' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: (Layer  2) The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: (Layer  3) The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
prediction_simulation = struct with fields:
       NumSamples: 5
    MetricResults: [1×1 struct]

For an FPGA based validation, The validate function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The validate function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

prediction_FPGA = dlQuantObj_FPGA.validate(validationData_FPGA,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8 ...
### The network includes the following layers:

     1   'imageinput'    Image Input             227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations  (SW Layer)
     2   'conv_1'        Convolution             96 5×5×3 convolutions with stride [1  1] and padding [0  0  0  0]                (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                                             (HW Layer)
     4   'maxpool_1'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
     5   'conv_2'        Convolution             128 3×3×96 convolutions with stride [1  1] and padding [0  0  0  0]              (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                                             (HW Layer)
     7   'maxpool_2'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
     8   'conv_3'        Convolution             384 3×3×128 convolutions with stride [1  1] and padding [0  0  0  0]             (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                                             (HW Layer)
    10   'maxpool_3'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
    11   'conv_4'        Convolution             128 3×3×384 convolutions with stride [2  2] and padding [0  0  0  0]             (HW Layer)
    12   'relu_4'        ReLU                    ReLU                                                                             (HW Layer)
    13   'maxpool_4'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
    14   'fc_1'          Fully Connected         2048 fully connected layer                                                       (HW Layer)
    15   'relu_5'        ReLU                    ReLU                                                                             (HW Layer)
    16   'dropout_1'     Dropout                 50% dropout                                                                      (HW Layer)
    17   'fc_2'          Fully Connected         2048 fully connected layer                                                       (HW Layer)
    18   'relu_6'        ReLU                    ReLU                                                                             (HW Layer)
    19   'dropout_2'     Dropout                 50% dropout                                                                      (HW Layer)
    20   'fc_3'          Fully Connected         32 fully connected layer                                                         (HW Layer)
    21   'softmax'       Softmax                 softmax                                                                          (SW Layer)
    22   'classoutput'   Classification Output   crossentropyex with 'adidas' and 31 other classes                                (SW Layer)

3 Memory Regions created.

Skipping: imageinput
Compiling leg: conv_1>>maxpool_4 ...
Compiling leg: conv_1>>maxpool_4 ... complete.
Compiling leg: fc_1>>fc_3 ...
Compiling leg: fc_1>>fc_3 ... complete.
Skipping: softmax
Skipping: classoutput
Creating Schedule...
.........
Creating Schedule...complete.
Creating Status Table...
........
Creating Status Table...complete.
Emitting Schedule...
......
Emitting Schedule...complete.
Emitting Status Table...
..........
Emitting Status Table...complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "48.0 MB"        
    "OutputResultOffset"        "0x03000000"     "4.0 MB"         
    "SchedulerDataOffset"       "0x03400000"     "4.0 MB"         
    "SystemBufferOffset"        "0x03800000"     "60.0 MB"        
    "InstructionDataOffset"     "0x07400000"     "8.0 MB"         
    "ConvWeightDataOffset"      "0x07c00000"     "12.0 MB"        
    "FCWeightDataOffset"        "0x08800000"     "12.0 MB"        
    "EndOffset"                 "0x09400000"     "Total: 148.0 MB"

### Network compilation complete.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   12722978                  0.05783                       1           12722978             17.3
    conv_1                 3437086                  0.01562 
    maxpool_1              1296014                  0.00589 
    conv_2                 2813632                  0.01279 
    maxpool_2               477861                  0.00217 
    conv_3                 2462903                  0.01120 
    maxpool_3               535330                  0.00243 
    conv_4                  504820                  0.00229 
    maxpool_4                 8965                  0.00004 
    fc_1                    687629                  0.00313 
    fc_2                    439923                  0.00200 
    fc_3                     58721                  0.00027 
 * The clock frequency of the DL processor is: 220MHz
prediction_FPGA = struct with fields:
             NumSamples: 1
          MetricResults: [1×1 struct]
    QuantizedNetworkFPS: 17.2915

View Performance of Quantized Neural Network

Examine the MetricResults.Result field of the validation output to see the performance of the quantized network.

prediction_simulation.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1      

prediction_FPGA.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1      

Examine the QuantizedNetworkFPS field of the validation output to see the frames per second performance of the quantized network.

prediction_FPGA.QuantizedNetworkFPS
ans = 17.2915

See Also