Bicyclist and Pedestrian Classification by Using FPGA
This example shows how to deploy a custom trained network to detect pedestrians and bicyclists based on their micro-Doppler signatures. This network is taken from the Pedestrian and Bicyclist Classification Using Deep Learning example from the Phased Array Toolbox. For more details on network training and input data, see Pedestrian and Bicyclist Classification Using Deep Learning (Radar Toolbox).
Prerequisites
Zynq® UltraScale+™ MPSoC ZCU102 Evaluation Kit
Deep learning HDL Toolbox™ Support Package for Xilinx® FPGA and SoC Devices
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
The data files used in this example are:
The MAT File
trainedNetBicPed.mat
contains a model trained on training data settrainDataNoCar
and its label settrainLabelNoCar
.The MAT File
testDataBicPed.mat
contains the test data settestDataNoCar
and its label settestLabelNoCar
.
Load Data and Network
Load the pretrained network. Load test data and its labels.
load('trainedNetBicPed.mat','trainedNetNoCar') load('testDataBicPed.mat')
View the layers of the pre-trained network:
deepNetworkDesigner(trainedNetNoCar);
Set up HDL Toolpath
Set up the path to your installed Xilinx™ Vivado™ Design Suite 2023.1 executable if it is not already set up. For example, to set the toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado','ToolPath', 'C:\Vivado\2023.1\bin');
Create Target Object
Create a target object for your target device with a vendor name and an interface to connect your target device to the host computer. Interface options are JTAG (default) and Ethernet. Vendor options are Intel or Xilinx. Use the installed Xilinx Vivado Design Suite over an Ethernet connection to program the device.
hT = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');
Create Workflow Object
Create an object of the dlhdl.Workflow
class. When you create the object, specify the network and the bitstream name. Specify the saved pre-trained network, trainedNetNoCar,
as the network. Make sure the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Zynq UltraScale+ MPSoC ZCU102 board. The bitstream uses a single data type.
hW = dlhdl.Workflow('Network', trainedNetNoCar, 'Bitstream', 'zcu102_single', 'Target', hT);
Compile trainedNetNoCar
Network
To compile the trainedNetNoCar
network, run the compile function of the dlhdl.Workflo
w object.
dn = hW.compile;
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_single. ### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network. ### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer' ### The network includes the following layers: 1 'imageinput' Image Input 400×144×1 images (SW Layer) 2 'conv_1' 2-D Convolution 16 10×10×1 convolutions with stride [1 1] and padding 'same' (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' 2-D Max Pooling 10×10 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' 2-D Convolution 32 5×5×16 convolutions with stride [1 1] and padding 'same' (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' 2-D Max Pooling 10×10 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' 2-D Convolution 32 5×5×32 convolutions with stride [1 1] and padding 'same' (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'maxpool_3' 2-D Max Pooling 10×10 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 11 'conv_4' 2-D Convolution 32 5×5×32 convolutions with stride [1 1] and padding 'same' (HW Layer) 12 'relu_4' ReLU ReLU (HW Layer) 13 'maxpool_4' 2-D Max Pooling 5×5 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 14 'conv_5' 2-D Convolution 32 5×5×32 convolutions with stride [1 1] and padding 'same' (HW Layer) 15 'relu_5' ReLU ReLU (HW Layer) 16 'avgpool2d' 2-D Average Pooling 2×2 average pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 17 'fc' Fully Connected 5 fully connected layer (HW Layer) 18 'softmax' Softmax softmax (SW Layer) 19 'Output1_softmax' Regression Output mean-squared-error (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>relu_5 ... ### Compiling layer group: conv_1>>relu_5 ... complete. ### Compiling layer group: avgpool2d ... ### Compiling layer group: avgpool2d ... complete. ### Compiling layer group: fc ... ### Compiling layer group: fc ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "26.4 MB" "OutputResultOffset" "0x01a5e000" "4.0 kB" "SchedulerDataOffset" "0x01a5f000" "72.0 kB" "SystemBufferOffset" "0x01a71000" "7.1 MB" "InstructionDataOffset" "0x0217e000" "1020.0 kB" "ConvWeightDataOffset" "0x0227d000" "888.0 kB" "FCWeightDataOffset" "0x0235b000" "24.0 kB" "EndOffset" "0x02361000" "Total: 35.4 MB" ### Network compilation complete.
Program the Bitstream onto FPGA and Download Network Weights
To deploy the network on the Zynq® UltraScale+™ MPSoC ZCU102 hardware, run the deploy function of the dlhdl.Workflow
object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function checks for the Xilinx Vivado tool and the supported tool version. It then starts programming the FPGA device by using the bitstream, displays progress messages and the time it takes to deploy the network.
hW.deploy;
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 192.168.1.101... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_single.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_single.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Programming done. The system will now reboot for persistent changes to take effect. ### Rebooting Xilinx SoC at 192.168.1.101... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 192.168.1.101... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 19-Jun-2024 17:04:03 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 19-Jun-2024 17:04:03
Run Predictions on Micro-Doppler Signatures
Classify one input from the sample test data set by using the predict function of the dlhdl.Workflow
object and display the label. The inputs to the network correspond to the sonograms of the micro-Doppler signatures for a pedestrian or a bicyclist or a combination of both.
testImg = single(testDataNoCar(:, :, :, 1)); testLabel = testLabelNoCar(1); % Get predictions from network on single test input testImg = dlarray(testImg, 'SSCB'); score = hW.predict(testImg, 'Profile', 'On')
### Finished writing input activations. ### Running single input activation. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 9312021 0.04233 1 9312779 23.6 conv_1 4186343 0.01903 maxpool_1 1387467 0.00631 conv_2 1976965 0.00899 maxpool_2 604660 0.00275 conv_3 816212 0.00371 maxpool_3 121647 0.00055 conv_4 146400 0.00067 maxpool_4 18760 0.00009 conv_5 42908 0.00020 avgpool2d 7226 0.00003 fc 3391 0.00002 * The clock frequency of the DL processor is: 220MHz
score = 5(C) × 1(B) single dlarray 0.9956 0.0000 0.0000 0.0044 0.0000
[~, idx1] = max(score); predTestLabel = testLabelNoCar(1,1,1,idx1)
predTestLabel = categorical
ped
Load five random images from the sample test data set and execute the predict function of the dlhdl.Workflow
object to display the labels alongside the signatures. The predictions will happen at once since the input is concatenated along the fourth dimension.
numTestFrames = size(testDataNoCar, 4); numView = 5; listIndex = randperm(numTestFrames, numView); testImgBatch = single(testDataNoCar(:, :, :, listIndex)); testLabelBatch = testLabelNoCar(listIndex); % Get predictions from network using DL HDL Toolbox on FPGA testImgBatch = dlarray(testImgBatch, 'SSCB'); [scores, speed] = hW.predict(testImgBatch, 'Profile', 'On');
### Finished writing input activations. ### Running in multi-frame mode with 5 inputs. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 9314346 0.04234 5 46556877 23.6 conv_1 4188705 0.01904 maxpool_1 1387527 0.00631 conv_2 1976807 0.00899 maxpool_2 604685 0.00275 conv_3 815776 0.00371 maxpool_3 121686 0.00055 conv_4 146622 0.00067 maxpool_4 18760 0.00009 conv_5 43098 0.00020 avgpool2d 7234 0.00003 fc 3404 0.00002 * The clock frequency of the DL processor is: 220MHz
[~, idx2] = max(scores, [], 1); predTestLabelBatch = testLabelNoCar(1,1,1,idx2); % Display the micro-doppler signatures along with the ground truth and % predictions. for k = 1:numView index = listIndex(k); imagesc(testDataNoCar(:, :, :, index)); axis xy xlabel('Time (s)') ylabel('Frequency (Hz)') title('Ground Truth: '+string(testLabelNoCar(index))+', Prediction FPGA: '+string(predTestLabelBatch(k))) drawnow; pause(3); end
The image shows the micro-Doppler signatures of two bicyclists (bic+bic) which is the ground truth. The ground truth is the classification of the image against which the network prediction is compared. The network prediction retrieved from the FPGA correctly predicts that the image has two bicyclists.
See Also
dlhdl.Workflow
| dlhdl.Target
| compile
| deploy
| predict