Quantize a deep neural network to 8-bit scaled integer data types
Use the dlquantizer
object to reduce the memory requirement of a
deep neural network by quantizing weights, biases, and activations to 8-bit scaled integer
data types.
creates a quantObj
= dlquantizer(net
)dlquantizer
object for the specified network.
creates a quantObj
= dlquantizer(net
,Name,Value
)dlquantizer
object for the specified network, with additional
options specified by one or more name-value pair arguments.
Use dlquantizer
to create an quantized network for GPU, FPGA, or
CPU deployment. To learn about the products required to quantize and deploy the deep
learning network to a GPU, FPGA, or CPU environment, see Quantization Workflow Prerequisites.
net
— Pretrained neural networkDAGNetwork
object | SeriesNetwork
object | yolov2ObjectDetector
object | ssdObjectDetector
objectPretrained neural network, specified as a DAGNetwork
,
SeriesNetwork
, yolov2ObjectDetector
(Computer Vision Toolbox), or a ssdObjectDetector
(Computer Vision Toolbox) object.
Quantization of ssdObjectDetector
(Computer Vision Toolbox) networks requires the
ExecutionEnvironment
property to be set to
'FPGA'
.
NetworkObject
— Pretrained neural networkDAGNetwork
object | SeriesNetwork
object | yolov2ObjectDetector
object | ssdObjectDetector
objectPretrained neural network, specified as a DAGNetwork
,
SeriesNetwork
,
yolov2ObjectDetector
(Computer Vision Toolbox), or a ssdObjectDetector
(Computer Vision Toolbox) object.
Quantization of ssdObjectDetector
(Computer Vision Toolbox) networks requires the
ExecutionEnvironment
property to be set to
'FPGA'
.
ExecutionEnvironment
— Execution environmentSpecify the execution environment for the quantized network. When this parameter is not specified the default execution environment is GPU. To learn about the products required to quantize and deploy the deep learning network to a GPU, FPGA, or CPU environment, see Quantization Workflow Prerequisites.
Example: 'ExecutionEnvironment'
,'FPGA'
Simulation
— Enable or disable MATLAB® simulation workflowEnable or disable the MATLAB simulation workflow. When this parameter is set to on, the quantized network is validated by simulating the quantized network in MATLAB and comparing the single data type network prediction results to the simulated network prediction results.
Example: 'Simulation'
, 'on'
This example shows how to specify an FPGA execution environment.
net = vgg19; quantobj = dlquantizer(net,'ExecutionEnvironment','FPGA');
This example shows how to quantize learnable parameters in the
convolution layers of a neural network, and explore the behavior of the quantized
network. In this example, you quantize the squeezenet
neural network
after retraining the network to classify new images according to the Train Deep Learning Network to Classify New Images example. In this
example, the memory required for the network is reduced approximately 75% through
quantization while the accuracy of the network is not affected.
Load the pretrained network. net
. net
is the
output network of the Train Deep Learning Network to Classify New Images example.
net
net = DAGNetwork with properties: Layers: [68x1 nnet.cnn.layer.Layer] Connections: [75x2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
In this example, use the images in the MerchData
data set.
Define an augmentedImageDatastore
object to resize the data for the
network. Then, split the data into calibration and validation data sets.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);
Create a dlquantizer
object and specify the network to
quantize.
quantObj = dlquantizer(net);
Define a metric function to use to compare the behavior of the network before and after quantization. Save this function in a local file.
function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore) %% Computes model-level accuracy statistics % Load ground truth tmp = readall(dataStore); groundTruth = tmp.response; % Compare with predicted label with actual ground truth predictionError = {}; for idx=1:numel(groundTruth) [~, idy] = max(predictionScores(idx,:)); yActual = net.Layers(end).Classes(idy); predictionError{end+1} = (yActual == groundTruth(idx)); %#ok end % Sum all prediction errors. predictionError = [predictionError{:}]; accuracy = sum(predictionError)/numel(predictionError); end
Specify the metric function in a dlquantizationOptions
object.
quantOpts = dlquantizationOptions('MetricFcn', ... {@(x)hComputeModelAccuracy(x, net, aug_valData)});
Use the calibrate
function to exercise the network with sample
inputs and collect range information. The calibrate
function
exercises the network and collects the dynamic ranges of the weights and biases in
the convolution and fully connected layers of the network and the dynamic ranges of
the activations in all layers of the network. The function returns a table. Each row
of the table contains range information for a learnable parameter of the optimized
network.
calResults = calibrate(quantObj, aug_calData)
calResults = 95x5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue __________________________________________________ _________________________ ________________________ __________ ___________ {'conv1_relu_conv1_Weights' } {'relu_conv1' } "Weights" -0.91985 0.88489 {'conv1_relu_conv1_Bias' } {'relu_conv1' } "Bias" -0.07925 0.26343 {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Weights'} {'fire2-relu_squeeze1x1'} "Weights" -1.38 1.2477 {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Bias' } {'fire2-relu_squeeze1x1'} "Bias" -0.11641 0.24273 {'fire2-expand1x1_fire2-relu_expand1x1_Weights' } {'fire2-relu_expand1x1' } "Weights" -0.7406 0.90982 {'fire2-expand1x1_fire2-relu_expand1x1_Bias' } {'fire2-relu_expand1x1' } "Bias" -0.060056 0.14602 {'fire2-expand3x3_fire2-relu_expand3x3_Weights' } {'fire2-relu_expand3x3' } "Weights" -0.74397 0.66905 {'fire2-expand3x3_fire2-relu_expand3x3_Bias' } {'fire2-relu_expand3x3' } "Bias" -0.051778 0.074239 {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Weights'} {'fire3-relu_squeeze1x1'} "Weights" -0.77263 0.68897 {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Bias' } {'fire3-relu_squeeze1x1'} "Bias" -0.10141 0.32678 {'fire3-expand1x1_fire3-relu_expand1x1_Weights' } {'fire3-relu_expand1x1' } "Weights" -0.72131 0.97287 {'fire3-expand1x1_fire3-relu_expand1x1_Bias' } {'fire3-relu_expand1x1' } "Bias" -0.067043 0.30424 {'fire3-expand3x3_fire3-relu_expand3x3_Weights' } {'fire3-relu_expand3x3' } "Weights" -0.61196 0.77431 {'fire3-expand3x3_fire3-relu_expand3x3_Bias' } {'fire3-relu_expand3x3' } "Bias" -0.053612 0.10329 {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Weights'} {'fire4-relu_squeeze1x1'} "Weights" -0.74145 1.0888 {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Bias' } {'fire4-relu_squeeze1x1'} "Bias" -0.10886 0.13882 ...
Use the validate
function to quantize the learnable parameters in
the convolution layers of the network and exercise the network. The function uses
the metric function defined in the dlquantizationOptions
object to
compare the results of the network before and after quantization.
valResults = validate(quantObj, aug_valData, quantOpts)
valResults = struct with fields: NumSamples: 20 MetricResults: [1x1 struct]
Examine the MetricResults.Result
field of the validation output
to see the performance of the quantized network.
valResults.MetricResults.Result
ans = 2x3 table NetworkImplementation MetricOutput LearnableParameterMemory(bytes) _____________________ ____________ _______________________________ {'Floating-Point'} 1 2.9003e+06 {'Quantized' } 1 7.3393e+05
In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.
The weights, biases, and activations of the convolution layers of the network
specified in the dlquantizer
object now use scaled 8-bit integer
data types.
This example shows how to quantize learnable parameters in the
convolution layers of a neural network, and explore the behavior of the quantized
network. In this example, you quantize the LogoNet
neural network.
Quantization helps reduce the memory requirement of a deep neural network by quantizing
weights, biases and activations of network layers to 8-bit scaled integer data types.
Use MATLAB® to retrieve the prediction results from the target device.
To run this example, you need the products listed under FPGA
in
Quantization Workflow Prerequisites.
For additional requirements, see Quantization Workflow Prerequisites.
Create a file in your current working directory called
getLogoNetwork.m
. Enter these lines into the file:
function net = getLogoNetwork() data = getLogoData(); net = data.convnet; end function data = getLogoData() if ~isfile('LogoNet.mat') url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat'; websave('LogoNet.mat',url); end data = load('LogoNet.mat'); end
Load the pretrained network.
snet = getLogoNetwork();
snet = SeriesNetwork with properties: Layers: [22×1 nnet.cnn.layer.Layer] InputNames: {'imageinput'} OutputNames: {'classoutput'}
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
This example uses the images in the logos_dataset
data set.
Define an augmentedImageDatastore
object to resize the data for the
network. Then, split the data into calibration and validation data sets.
curDir = pwd; newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip'); copyfile(newDir,curDir); unzip('logos_dataset.zip'); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
Create a dlquantizer
object and specify the network to
quantize.
dlQuantObj = dlquantizer(snet,'ExecutionEnvironment','FPGA');
Use the calibrate
function to exercise the network with sample
inputs and collect range information. The calibrate
function
exercises the network and collects the dynamic ranges of the weights and biases in
the convolution and fully connected layers of the network and the dynamic ranges of
the activations in all layers of the network. The function returns a table. Each row
of the table contains range information for a learnable parameter of the optimized
network.
dlQuantObj.calibrate(calibrationData)
ans = Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue ____________________________ __________________ ________________________ ___________ __________ {'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352 {'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028 {'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901 {'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227 {'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927 {'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218 {'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051 {'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892 {'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344 {'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454 {'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557 {'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502 {'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678 {'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855 {'imageinput' } {'imageinput'} "Activations" 0 255 {'imageinput_normalization'} {'imageinput'} "Activations" -139.34 198.72
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To create the target object, enter:
hTarget = dlhdl.Target('Intel', 'Interface', 'JTAG');
Define a metric function to use to compare the behavior of the network before and after quantization. Save this function in a local file.
function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore) %% hComputeModelAccuracy test helper function computes model level accuracy statistics % Copyright 2020 The MathWorks, Inc. % Load ground truth groundTruth = dataStore.Labels; % Compare with predicted label with actual ground truth predictionError = {}; for idx=1:numel(groundTruth) [~, idy] = max(predictionScores(idx, :)); yActual = net.Layers(end).Classes(idy); predictionError{end+1} = (yActual == groundTruth(idx)); %#ok end % Sum all prediction errors. predictionError = [predictionError{:}]; accuracy = sum(predictionError)/numel(predictionError); end
Specify the metric function in a dlquantizationOptions
object.
options = dlquantizationOptions('MetricFcn', ... {@(x)hComputeModelAccuracy(x, snet, validationData)},'Bitstream','arria10soc_int8',... 'Target',hTarget);
To compile and deploy the quantized network, run the validate
function of the dlquantizer
object. Use the
validate
function to quantize the learnable parameters in the
convolution layers of the network and exercise the network. This function uses the
output of the compile function to program the FPGA board by using the programming
file. It also downloads the network weights and biases. The deploy function checks
for the Intel Quartus tool and the supported tool version. It then starts
programming the FPGA device by using the sof file, displays progress messages, and
the time it takes to deploy the network. The function uses the metric function
defined in the dlquantizationOptions
object to compare the results
of the network before and after quantization.
prediction = dlQuantObj.validate(validationData,options);
offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "48.0 MB" "OutputResultOffset" "0x03000000" "4.0 MB" "SystemBufferOffset" "0x03400000" "60.0 MB" "InstructionDataOffset" "0x07000000" "8.0 MB" "ConvWeightDataOffset" "0x07800000" "8.0 MB" "FCWeightDataOffset" "0x08000000" "12.0 MB" "EndOffset" "0x08c00000" "Total: 140.0 MB" ### Programming FPGA Bitstream using JTAG... ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 16-Jul-2020 12:45:10 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 16-Jul-2020 12:45:26 ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13570959 0.09047 30 380609145 11.8 conv_module 12667786 0.08445 conv_1 3938907 0.02626 maxpool_1 1544560 0.01030 conv_2 2910954 0.01941 maxpool_2 577524 0.00385 conv_3 2552707 0.01702 maxpool_3 676542 0.00451 conv_4 455434 0.00304 maxpool_4 11251 0.00008 fc_module 903173 0.00602 fc_1 536164 0.00357 fc_2 342643 0.00228 fc_3 24364 0.00016 * The clock frequency of the DL processor is: 150MHz ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13570364 0.09047 30 380612682 11.8 conv_module 12667103 0.08445 conv_1 3939296 0.02626 maxpool_1 1544371 0.01030 conv_2 2910747 0.01940 maxpool_2 577654 0.00385 conv_3 2551829 0.01701 maxpool_3 676548 0.00451 conv_4 455396 0.00304 maxpool_4 11355 0.00008 fc_module 903261 0.00602 fc_1 536206 0.00357 fc_2 342688 0.00228 fc_3 24365 0.00016 * The clock frequency of the DL processor is: 150MHz ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13571561 0.09048 30 380608338 11.8 conv_module 12668340 0.08446 conv_1 3939070 0.02626 maxpool_1 1545327 0.01030 conv_2 2911061 0.01941 maxpool_2 577557 0.00385 conv_3 2552082 0.01701 maxpool_3 676506 0.00451 conv_4 455582 0.00304 maxpool_4 11248 0.00007 fc_module 903221 0.00602 fc_1 536167 0.00357 fc_2 342643 0.00228 fc_3 24409 0.00016 * The clock frequency of the DL processor is: 150MHz ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13569862 0.09047 30 380613327 11.8 conv_module 12666756 0.08445 conv_1 3939212 0.02626 maxpool_1 1543267 0.01029 conv_2 2911184 0.01941 maxpool_2 577275 0.00385 conv_3 2552868 0.01702 maxpool_3 676438 0.00451 conv_4 455353 0.00304 maxpool_4 11252 0.00008 fc_module 903106 0.00602 fc_1 536050 0.00357 fc_2 342645 0.00228 fc_3 24409 0.00016 * The clock frequency of the DL processor is: 150MHz ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13570823 0.09047 30 380619836 11.8 conv_module 12667607 0.08445 conv_1 3939074 0.02626 maxpool_1 1544519 0.01030 conv_2 2910636 0.01940 maxpool_2 577769 0.00385 conv_3 2551800 0.01701 maxpool_3 676795 0.00451 conv_4 455859 0.00304 maxpool_4 11248 0.00007 fc_module 903216 0.00602 fc_1 536165 0.00357 fc_2 342643 0.00228 fc_3 24406 0.00016 * The clock frequency of the DL processor is: 150MHz offset_name offset_address allocated_space _______________________ ______________ _________________ "InputDataOffset" "0x00000000" "48.0 MB" "OutputResultOffset" "0x03000000" "4.0 MB" "SystemBufferOffset" "0x03400000" "60.0 MB" "InstructionDataOffset" "0x07000000" "8.0 MB" "ConvWeightDataOffset" "0x07800000" "8.0 MB" "FCWeightDataOffset" "0x08000000" "12.0 MB" "EndOffset" "0x08c00000" "Total: 140.0 MB" ### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA. ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13572329 0.09048 10 127265075 11.8 conv_module 12669135 0.08446 conv_1 3939559 0.02626 maxpool_1 1545378 0.01030 conv_2 2911243 0.01941 maxpool_2 577422 0.00385 conv_3 2552064 0.01701 maxpool_3 676678 0.00451 conv_4 455657 0.00304 maxpool_4 11227 0.00007 fc_module 903194 0.00602 fc_1 536140 0.00357 fc_2 342688 0.00228 fc_3 24364 0.00016 * The clock frequency of the DL processor is: 150MHz ### Finished writing input activations. ### Running single input activations. Deep Learning Processor Profiler Performance Results LastLayerLatency(cycles) LastLayerLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 13572527 0.09048 10 127266427 11.8 conv_module 12669266 0.08446 conv_1 3939776 0.02627 maxpool_1 1545632 0.01030 conv_2 2911169 0.01941 maxpool_2 577592 0.00385 conv_3 2551613 0.01701 maxpool_3 676811 0.00451 conv_4 455418 0.00304 maxpool_4 11348 0.00008 fc_module 903261 0.00602 fc_1 536205 0.00357 fc_2 342689 0.00228 fc_3 24365 0.00016 * The clock frequency of the DL processor is: 150MHz
Examine the MetricResults.Result
field of the validation output
to see the performance of the quantized network.
validateOut = prediction.MetricResults.Result
ans = NetworkImplementation MetricOutput _____________________ ____________ {'Floating-Point'} 0.9875 {'Quantized' } 0.9875
Examine the QuantizedNetworkFPS
field of the validation output
to see the frames per second performance of the quantized network.
prediction.QuantizedNetworkFPS
ans = 11.8126
The weights, biases, and activations of the convolution layers of the network
specified in the dlquantizer
object now use scaled 8-bit integer
data types.
dlquantizer
Object into the Deep Network Quantizer AppThis example shows you how to import a dlquantizer
object from the base workspace into the Deep Network Quantizer app. This allows you to begin quantization of a deep neural network using the command line or the app, and resume your work later in the app.
Load the network to quantize into the base workspace.
net
net = DAGNetwork with properties: Layers: [68x1 nnet.cnn.layer.Layer] Connections: [75x2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}
Define calibration and validation data to use for quantization.
The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.
The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.
In this example, use the images in the MerchData
data set. Define an augmentedImageDatastore
object to resize the data for the network. Then, split the data into calibration and validation data sets.
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);
Create a dlquantizer
object and specify the network to quantize.
quantObj = dlquantizer(net);
Use the calibrate
function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.
calResults = calibrate(quantObj, aug_calData)
calResults = 95x5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue __________________________________________________ _________________________ ________________________ __________ __________ {'conv1_relu_conv1_Weights' } {'relu_conv1' } "Weights" -0.91985 0.88489 {'conv1_relu_conv1_Bias' } {'relu_conv1' } "Bias" -0.07925 0.26343 {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Weights'} {'fire2-relu_squeeze1x1'} "Weights" -1.38 1.2477 {'fire2-squeeze1x1_fire2-relu_squeeze1x1_Bias' } {'fire2-relu_squeeze1x1'} "Bias" -0.11641 0.24273 {'fire2-expand1x1_fire2-relu_expand1x1_Weights' } {'fire2-relu_expand1x1' } "Weights" -0.7406 0.90982 {'fire2-expand1x1_fire2-relu_expand1x1_Bias' } {'fire2-relu_expand1x1' } "Bias" -0.060056 0.14602 {'fire2-expand3x3_fire2-relu_expand3x3_Weights' } {'fire2-relu_expand3x3' } "Weights" -0.74397 0.66905 {'fire2-expand3x3_fire2-relu_expand3x3_Bias' } {'fire2-relu_expand3x3' } "Bias" -0.051778 0.074239 {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Weights'} {'fire3-relu_squeeze1x1'} "Weights" -0.77262 0.68583 {'fire3-squeeze1x1_fire3-relu_squeeze1x1_Bias' } {'fire3-relu_squeeze1x1'} "Bias" -0.10145 0.32669 {'fire3-expand1x1_fire3-relu_expand1x1_Weights' } {'fire3-relu_expand1x1' } "Weights" -0.72083 0.97157 {'fire3-expand1x1_fire3-relu_expand1x1_Bias' } {'fire3-relu_expand1x1' } "Bias" -0.067019 0.30422 {'fire3-expand3x3_fire3-relu_expand3x3_Weights' } {'fire3-relu_expand3x3' } "Weights" -0.61403 0.77544 {'fire3-expand3x3_fire3-relu_expand3x3_Bias' } {'fire3-relu_expand3x3' } "Bias" -0.053621 0.1033 {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Weights'} {'fire4-relu_squeeze1x1'} "Weights" -0.74164 1.0865 {'fire4-squeeze1x1_fire4-relu_squeeze1x1_Bias' } {'fire4-relu_squeeze1x1'} "Bias" -0.10885 0.13875 ...
Open the Deep Network Quantizer app.
deepNetworkQuantizer
In the app, click New and select Import dlquantizer object
.
In the dialog, select the dlquantizer
object to import from the base workspace.
The app imports any data contained in the dlquantizer
object that was
collected at the command line. This data can include the network to quantize,
calibration data, validation data, and calibration statistics.
The app displays a table containing the calibration data contained in the imported dlquantizer
object, quantObj
. To the right of the table, the app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.
To explore the behavior of a neural network that has quantized
convolution layers, use the Deep Network Quantizer app. This example
quantizes the learnable parameters of the convolution layers of the
LogoNet
neural network.
For this example, you need the products listed under FPGA
in Quantization Workflow Prerequisites.
For additional requirements, see Quantization Workflow Prerequisites.
Create a file in your current working folder called
getLogoNetwork.m
. In the file, enter:
function net = getLogoNetwork() if ~isfile('LogoNet.mat') url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat'; websave('LogoNet.mat',url); end data = load('LogoNet.mat'); net = data.convnet; end
Load the pretrained network.
snet = getLogoNetwork();
snet = SeriesNetwork with properties: Layers: [22×1 nnet.cnn.layer.Layer] InputNames: {'imageinput'} OutputNames: {'classoutput'}
Define calibration and validation data to use for quantization.
The app uses calibration data to exercise the network and collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network. The app also exercises the dynamic ranges of the activations in all layers of the LogoNet network. For the best quantization results, the calibration data must be representative of inputs to the LogoNet network.
After quantization, the app uses the validation data set to test the network to understand the effects of the limited range and precision of the quantized learnable parameters of the convolution layers in the network.
In this example, use the images in the logos_dataset
data set
to calibrate and validate the LogoNet network. Define an
augmentedImageDatastore
object to resize the data for the
network. Then, split the data into calibration and validation data sets.
Expedite the calibration and validation process by using a subset of the
calibrationData
and validationData
. Store
the new reduced calibration data set in calibrationData_concise
and the new reduced validation data set in
validationData_concise
.
curDir = pwd; newDir = fullfile(matlabroot,'examples','deeplearning_shared','data','logos_dataset.zip'); copyfile(newDir,curDir); unzip('logos_dataset.zip'); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized'); calibrationData_concise = calibrationData.subset(1:20); validationData_concise = vaidationData.subset(1:1);
At the MATLAB command prompt, open the Deep Network Quantizer app.
deepNetworkQuantizer
Click New and select Quantize a
network
.
The app verifies your execution environment.
Select the execution environment and the network to quantize from the base
workspace. For this example, select a FPGA execution environment and the series
network snet
.
The app displays the layer graph of the selected network.
In the Calibrate section of the app toolstrip, under
Calibration Data, select the
augmentedImageDatastore
object from the base workspace
containing the calibration data calibrationData_concise
.
Click Calibrate.
The Deep Network Quantizer app uses the calibration data to exercise the network and collect range information for the learnable parameters in the network layers.
When the calibration is complete, the app displays a table containing the weights and biases in the convolution and fully connected layers of the network. Also displayed are the dynamic ranges of the activations in all layers of the network and their minimum and maximum values during the calibration. The app displays histograms of the dynamic ranges of the parameters. The gray regions of the histograms indicate data that cannot be represented by the quantized representation. For more information on how to interpret these histograms, see Quantization of Deep Neural Networks.
In the Quantize column of the table, indicate whether to quantize the learnable parameters in the layer. You cannot quantize layers that are not convolution layers. Layers that are not quantized remain in single-precision.
In the Validate section of the app toolstrip, under
Validation Data, select the
augmentedImageDatastore
object from the base workspace
containing the validation data validationData_concise
.
In the Hardware Settings section of the toolstrip, select from the options listed in the table:
Simulation Environment | Action |
MATLAB (Simulate in MATLAB) | Simulates the quantized network in MATLAB. Validates the quantized network by comparing performance to single-precision version of the network. |
Intel Arria 10 SoC (arria10soc_int8) | Deploys the quantized network to an Intel®
Arria® 10 SoC board by using the
|
Xilinx ZCU102 (zcu102_int8) | Deploys the quantized network to a Xilinx®
Zynq®
UltraScale+™ MPSoC ZCU102 10 SoC board by using the
|
Xilinx ZC706 (zc706_int8) | Deploys the quantized network to a Xilinx
Zynq-7000 ZC706 board by using the
|
When you select the Intel Arria 10 SoC
(arria10soc_int8)
, Xilinx ZCU102
(zcu102_int8)
, or Xilinx ZC706
(zc706_int8)
options, select the interface to use to deploy and
validate the quantized network. The Target interface
options are listed in this table.
Target Option | Action |
JTAG | Programs the target FPGA board selected in Simulation Environment by using a JTAG cable. For more information, see JTAG Connection (Deep Learning HDL Toolbox) |
Ethernet | Programs the target FPGA board selected in Simulation Environment through the Ethernet interface. Specify the IP address for your target board in IP Address. |
For this example, select Xilinx ZCU102 (zcu102_int8)
,
select Ethernet, and enter the board IP address.
In the Validate section of the app toolstrip, under Quantization Options, select the Default metric function.
Click Quantize and Validate.
The Deep Network Quantizer app quantizes the weights, activations, and biases of convolution layers in the network to scaled 8-bit integer data types and uses the validation data to exercise the network. The app determines a metric function to use for the validation based on the type of network that is being quantized.
Type of Network | Metric Function |
---|---|
Classification | Top-1 Accuracy – Accuracy of the network |
Object Detection | Average Precision –
Average precision over all detection results. See |
Regression | MSE – Mean squared error of the network |
Semantic Segmentation | evaluateSemanticSegmentation (Computer Vision Toolbox) – Evaluate semantic
segmentation data set against ground truth |
Single Shot Detector (SSD) | WeightedIOU – Average IoU of each class, weighted by the number of pixels in that class |
When the validation is complete, the app displays the results of the validation, including:
Metric function used for validation
Result of the metric function before and after quantization
If you want to use a different metric function for validation, for example to use the Top-5 accuracy metric function instead of the default Top-1 accuracy metric function, you can define a custom metric function. Save this function in a local file.
function accuracy = hComputeAccuracy(predictionScores, net, dataStore) %% Computes model-level accuracy statistics % Load ground truth tmp = readall(dataStore); groundTruth = tmp.response; % Compare with predicted label with actual ground truth predictionError = {}; for idx=1:numel(groundTruth) [~, idy] = max(predictionScores(idx,:)); yActual = net.Layers(end).Classes(idy); predictionError{end+1} = (yActual == groundTruth(idx)); %#ok end % Sum all prediction errors. predictionError = [predictionError{:}]; accuracy = sum(predictionError)/numel(predictionError); end
To revalidate the network by using this custom metric function, under
Quantization Options, enter the name of the custom metric
function hComputeAccuracy
. Select Add to
add hComputeAccuracy
to the list of metric functions available in
the app. Select hComputeAccuracy
as the metric function to
use.
The custom metric function must be on the path. If the metric function is not on the path, this step produces an error.
Click Quantize and Validate.
The app quantizes the network and displays the validation results for the custom metric function.
The app displays only scalar values in the validation results table. To view the
validation results for a custom metric function with nonscalar output, export the
dlquantizer
object, then validate the quantized network by
using the validate
function in the MATLAB command window.
After quantizing and validating the network, you can choose to export the quantized network.
Click the Export button. In the drop-down list, select
Export Quantizer
to create a
dlquantizer
object in the base workspace. You can deploy the
quantized network to your target FPGA board and retrieve the prediction results by
using MATLAB. See, Deploy Quantized Network Example (Deep Learning HDL Toolbox).
If the performance of the quantized network is not satisfactory, you can choose to not quantize some layers by clearing the layer in the table. Click Quantize and Validate again.
Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:
Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
Select web siteYou can also select a web site from the following list:
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.