Main Content

Estimate Performance of Deep Learning Network

To reduce the time required to design a custom deep learning network that meets performance requirements, before deploying the network, analyze layer level latencies. Compare deep learning network performances on custom bitstream processor configurations to performances on reference (shipping) bitstream processor configurations.

To learn how to use the information in the table data from the estimatePerformance function to calculate your network performance, see Profile Inference Run.

Estimate Performance of Custom Deep Learning Network for Custom Processor Configuration

This example shows how to calculate the performance of a deep learning network for a custom processor configuration.

  1. Create a file in your current working folder called getLogoNetwork.m. In the file, enter:

    function net = getLogoNetwork()
     if ~isfile('LogoNet.mat')
            url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat';
            websave('LogoNet.mat',url);
        end
        data = load('LogoNet.mat');
        net  = data.convnet;
    end

    Call the function and save the result in snet.

    snet = getLogoNetwork;
  2. Create a dlhdl.ProcessorConfig object.

    hPC = dlhdl.ProcessorConfig;
  3. Call estimatePerformance with snet to retrieve the layer level latencies and performance for the LogoNet network.

    hPC.estimatePerformance(snet)
    3 Memory Regions created.
    
    
    
                  Deep Learning Processor Estimator Performance Results
    
                       LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                             -------------             -------------              ---------        ---------       ---------
    Network                   39853460                  0.19927                       1           39853460              5.0
            conv_1             6825287                  0.03413 
            maxpool_1          3755088                  0.01878 
            conv_2            10440701                  0.05220 
            maxpool_2          1447840                  0.00724 
            conv_3             9393397                  0.04697 
            maxpool_3          1765856                  0.00883 
            conv_4             1770484                  0.00885 
            maxpool_4            28098                  0.00014 
            fc_1               2644884                  0.01322 
            fc_2               1692532                  0.00846 
            fc_3                 89293                  0.00045 
     * The clock frequency of the DL processor is: 200MHz

    To learn about the parameters and values returned by estimatePerformance, see .

Evaluate Performance of Deep Learning Network on Custom Processor Configuration

Benchmark the performance of a deep learning network on a custom bitstream configuration by comparing it to the performance on a reference (shipping) bitstream configuration. Use the comparison results to adjust your custom deep learning processor parameters to achieve optimum performance.

In this example compare the performance of the ResNet-18 network on the zcu102_single bitstream configuration to the performance on the default custom bitstream configuration.

Prerequisites

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

  • Deep Learning Toolbox Model for ResNet-18 Network

Load Pretrained Network

Load the pretrained network.

snet = resnet18;

Retrieve zcu102_single Bitstream Configuration

To retrieve the zcu102_single bitstream configuration, use the dlhdl.ProcessorConfig object. For more information, see dlhdl.ProcessorConfig. To learn about modifiable parameters of the processor configuration, see getModuleProperty and setModuleProperty.

hPC_shipping = dlhdl.ProcessorConfig('Bitstream',"zcu102_single")
hPC_shipping = 
                    Processing Module "conv"
                            ConvThreadNumber: 16
                             InputMemorySize: [227  227    3]
                            OutputMemorySize: [227  227    3]
                            FeatureSizeLimit: 2048
                              KernelDataType: 'single'

                      Processing Module "fc"
                              FCThreadNumber: 4
                             InputMemorySize: 25088
                            OutputMemorySize: 4096
                              KernelDataType: 'single'

                   Processing Module "adder"
                             InputMemorySize: 40
                            OutputMemorySize: 40
                              KernelDataType: 'single'

                     System Level Properties
                              TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
                             TargetFrequency: 220
                               SynthesisTool: 'Xilinx Vivado'
                             ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
                     SynthesisToolChipFamily: 'Zynq UltraScale+'
                     SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
                    SynthesisToolPackageName: ''
                     SynthesisToolSpeedValue: ''

Estimate ResNet-18 Performance for zcu102_single Bitstream Configuration

To estimate the performance of the ResNet-18 DAG network, use the estimatePerformance function of the dlhdl.ProcessorConfig object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).

hPC_shipping.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
5 Memory Regions created.



              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   22576184                  0.10262                       1           22576184              9.7
    ____conv1              2165372                  0.00984 
    ____pool1               646226                  0.00294 
    ____res2a_branch2a      966221                  0.00439 
    ____res2a_branch2b      966221                  0.00439 
    ____res2b_branch2a      966221                  0.00439 
    ____res2b_branch2b      966221                  0.00439 
    ____res3a_branch2a      757716                  0.00344 
    ____res3a_branch2b      919117                  0.00418 
    ____res3a_branch1       540749                  0.00246 
    ____res3b_branch2a      919117                  0.00418 
    ____res3b_branch2b      919117                  0.00418 
    ____res4a_branch2a      509261                  0.00231 
    ____res4a_branch2b      905421                  0.00412 
    ____res4a_branch1       509261                  0.00231 
    ____res4b_branch2a      905421                  0.00412 
    ____res4b_branch2b      905421                  0.00412 
    ____res5a_branch2a     1013837                  0.00461 
    ____res5a_branch2b     1939661                  0.00882 
    ____res5a_branch1      1013837                  0.00461 
    ____res5b_branch2a     1939661                  0.00882 
    ____res5b_branch2b     1939661                  0.00882 
    ____pool5                54594                  0.00025 
    ____fc1000              207850                  0.00094 
 * The clock frequency of the DL processor is: 220MHz

Create Custom Processor Configuration

To create a custom processor configuration, use the dlhdl.ProcessorConfig object. For more information, see dlhdl.ProcessorConfig. To learn about modifiable parameters of the processor configuration, see getModuleProperty and setModuleProperty.

hPC_custom = dlhdl.ProcessorConfig
hPC_custom = 
                    Processing Module "conv"
                            ConvThreadNumber: 16
                             InputMemorySize: [227  227    3]
                            OutputMemorySize: [227  227    3]
                            FeatureSizeLimit: 2048
                              KernelDataType: 'single'

                      Processing Module "fc"
                              FCThreadNumber: 4
                             InputMemorySize: 25088
                            OutputMemorySize: 4096
                              KernelDataType: 'single'

                   Processing Module "adder"
                             InputMemorySize: 40
                            OutputMemorySize: 40
                              KernelDataType: 'single'

                     System Level Properties
                              TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
                             TargetFrequency: 200
                               SynthesisTool: 'Xilinx Vivado'
                             ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
                     SynthesisToolChipFamily: 'Zynq UltraScale+'
                     SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
                    SynthesisToolPackageName: ''
                     SynthesisToolSpeedValue: ''

Estimate ResNet-18 Performance for Custom Bitstream Configuration

To estimate the performance of the ResNet-18 DAG network, use the estimatePerformance function of the dlhdl.ProcessorConfig object. The function returns the estimated layer latency, network latency, and network performance in frames per second (Frames/s).

hPC_custom.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
5 Memory Regions created.



              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   22575683                  0.11288                       1           22575683              8.9
    ____conv1              2165372                  0.01083 
    ____pool1               646226                  0.00323 
    ____res2a_branch2a      966221                  0.00483 
    ____res2a_branch2b      966221                  0.00483 
    ____res2b_branch2a      966221                  0.00483 
    ____res2b_branch2b      966221                  0.00483 
    ____res3a_branch2a      757716                  0.00379 
    ____res3a_branch2b      919117                  0.00460 
    ____res3a_branch1       540749                  0.00270 
    ____res3b_branch2a      919117                  0.00460 
    ____res3b_branch2b      919117                  0.00460 
    ____res4a_branch2a      509261                  0.00255 
    ____res4a_branch2b      905421                  0.00453 
    ____res4a_branch1       509261                  0.00255 
    ____res4b_branch2a      905421                  0.00453 
    ____res4b_branch2b      905421                  0.00453 
    ____res5a_branch2a     1013837                  0.00507 
    ____res5a_branch2b     1939661                  0.00970 
    ____res5a_branch1      1013837                  0.00507 
    ____res5b_branch2a     1939661                  0.00970 
    ____res5b_branch2b     1939661                  0.00970 
    ____pool5                54594                  0.00027 
    ____fc1000              207349                  0.00104 
 * The clock frequency of the DL processor is: 200MHz

The performance of the ResNet-18 network on the custom bitstream configuration is lower than the performance on the zcu102_single bitstream configuration. The difference between the custom bitstream configuration and the zcu102_single bitstream configuration is the target frequency.

Modify Custom Processor Configuration

Modify the custom processor configuration to increase the target frequency. To learn about modifiable parameters of the processor configuration, see dlhdl.ProcessorConfig.

hPC_custom.TargetFrequency = 220;
hPC_custom
hPC_custom = 
                    Processing Module "conv"
                            ConvThreadNumber: 16
                             InputMemorySize: [227  227    3]
                            OutputMemorySize: [227  227    3]
                            FeatureSizeLimit: 2048
                              KernelDataType: 'single'

                      Processing Module "fc"
                              FCThreadNumber: 4
                             InputMemorySize: 25088
                            OutputMemorySize: 4096
                              KernelDataType: 'single'

                   Processing Module "adder"
                             InputMemorySize: 40
                            OutputMemorySize: 40
                              KernelDataType: 'single'

                     System Level Properties
                              TargetPlatform: 'Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit'
                             TargetFrequency: 220
                               SynthesisTool: 'Xilinx Vivado'
                             ReferenceDesign: 'AXI-Stream DDR Memory Access : 3-AXIM'
                     SynthesisToolChipFamily: 'Zynq UltraScale+'
                     SynthesisToolDeviceName: 'xczu9eg-ffvb1156-2-e'
                    SynthesisToolPackageName: ''
                     SynthesisToolSpeedValue: ''

Re-estimate ResNet-18 Performance for Modified Custom Bitstream Configuration

Estimate the performance of the ResNet-18 DAG network on the modified custom bitstream configuration.

hPC_custom.estimatePerformance(snet)
### Optimizing series network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
5 Memory Regions created.



              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   22576184                  0.10262                       1           22576184              9.7
    ____conv1              2165372                  0.00984 
    ____pool1               646226                  0.00294 
    ____res2a_branch2a      966221                  0.00439 
    ____res2a_branch2b      966221                  0.00439 
    ____res2b_branch2a      966221                  0.00439 
    ____res2b_branch2b      966221                  0.00439 
    ____res3a_branch2a      757716                  0.00344 
    ____res3a_branch2b      919117                  0.00418 
    ____res3a_branch1       540749                  0.00246 
    ____res3b_branch2a      919117                  0.00418 
    ____res3b_branch2b      919117                  0.00418 
    ____res4a_branch2a      509261                  0.00231 
    ____res4a_branch2b      905421                  0.00412 
    ____res4a_branch1       509261                  0.00231 
    ____res4b_branch2a      905421                  0.00412 
    ____res4b_branch2b      905421                  0.00412 
    ____res5a_branch2a     1013837                  0.00461 
    ____res5a_branch2b     1939661                  0.00882 
    ____res5a_branch1      1013837                  0.00461 
    ____res5b_branch2a     1939661                  0.00882 
    ____res5b_branch2b     1939661                  0.00882 
    ____pool5                54594                  0.00025 
    ____fc1000              207850                  0.00094 
 * The clock frequency of the DL processor is: 220MHz

See Also

| | | |

Related Topics