YOLO v2 Vehicle Detector with Live Camera Input on Zynq-Based Hardware
Introduction
The YOLO v2 Vehicle Detector with Live Camera Input example extends the Deploy and Verify YOLO v2 Vehicle Detector on FPGA example by adding live HDMI video input and by targeting the post processing logic to the ARM processor of the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 Evaluation Kit. The example uses a new RGB for DL Processor reference design. The reference design passes the HDMI input to the preprocessing logic and also writes the input frame to PS DDR. After preprocessing, the design writes the resized and normalized images to PL DDR where it can be accessed by the DL processor. After the DL processor writes the output back to DDR, the postprocessing code reads the output frames to calculate and overlay bounding boxes. These modified output frames are returned on the HDMI output and can also be accessed in Simulink® by using the Video Capture HDMI block.
Setup Prerequisites
This example follows the algorithm development workflow that is detailed in the Developing Vision Algorithms for Zynq-Based Hardware (Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware) example. If you have not already done so, please work through that example to gain a better understanding of the required workflow.
If you have not yet done so, run through the guided setup wizard portion of the Zynq support package installation. You might have already completed this step when you installed this support package.
On the MATLAB Home tab, in the Environment section of the Toolstrip, click Add-Ons > Manage Add-Ons. Locate Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware, and click Setup.
The guided setup wizard performs a number of initial setup steps, and confirms that the target can boot and that the host and target can communicate.
For more information, see Setup for Vision Hardware (Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware).
Input Video File and Network
This example uses PandasetCameraData.mp4 created from PandaSet data set as the input and it uses network from yolov2VehicleDetector32Layer.mat file. These files are approximately 47MB and 2MB in size. Download the file from Mathworks support website and unzip the downloaded file.
PandasetZipFile = matlab.internal.examples.downloadSupportFile('visionhdl','PandasetCameraData.zip'); [outputFolder,~,~] = fileparts(PandasetZipFile); unzip(PandasetZipFile,outputFolder); pandasetVideoFile = fullfile(outputFolder,'PandasetCameraData'); addpath(pandasetVideoFile);
Pixel Stream Model Design Under Test
The DUT in this example selects a region of interest (ROI) from the input frames to meet the requirements of the DL processor. The model selects 1000-by-500 region of the incoming 1920-by-1080 video.
Since the DL IP core cannot keep up with the incoming frame rate from the camera, the model also includes frame drop logic. The model only processes frames when the DL processor IP core is ready to accept the data.
Configure Deep Learning Processor and Generate IP Core
The deep learning processor IP core accesses the preprocessed input from the DDR memory, performs the vehicle detection, and loads the output back into the memory. To generate a deep learning processor IP core that has the required interfaces, create a deep learning processor configuration by using the
class. In the processor configuration, set the dlhdl.ProcessorConfig
(Deep Learning HDL Toolbox)InputRunTimeControl
and OutputRunTimeControl
parameters. These parameters indicate the interface type for interfacing between the input and output of the deep learning processor. To learn about these parameters, see Interface with the Deep Learning Processor IP Core (Deep Learning HDL Toolbox). In this example, the deep learning processor uses the register
mode for input and output runtime control.
hPC = dlhdl.ProcessorConfig;
hPC.InputRunTimeControl = "register";
hPC.OutputRunTimeControl = "register";
Set the TargetPlatform
property of the processor configuration object as Generic Deep Learning Processor
. This option generates a custom generic deep learning processor IP core.
hPC.TargetPlatform = 'Generic Deep Learning Processor';
Use the setModuleProperty
method to set the properties of the conv
module of the deep learning processor. These properties can be tuned based on the design choice to ensure that the design fits on the FPGA. To learn more about these parameters, see
. In this example, setModuleProperty
(Deep Learning HDL Toolbox)LRNBlockGeneration
is turned on and SegmentationBlockGeneration
is turned off to support YOLOv2 vehicle detection network. ConvThreadNumber
is set to 9.
hPC.setModuleProperty('conv','LRNBlockGeneration', 'on');
hPC.setModuleProperty('conv','SegmentationBlockGeneration', 'off');
hPC.setModuleProperty('conv','ConvThreadNumber',9);
This example uses the Xilinx ZCU102 board to deploy the deep learning processor. Use the hdlsetuptoolpath
function to add the Xilinx Vivado synthesis tool path to the system path.
hdlsetuptoolpath('ToolName','Xilinx Vivado','ToolPath','C:\Xilinx\Vivado\2022.1\bin\vivado.bat');
Use the dlhdl.buildProcessor
function with the hPC
object to generate the deep learning IP core. It takes some time to generate the deep learning processor IP core.
dlhdl.buildProcessor(hPC);
The generated IP core contains a standard set of registers and the generated IP core report. The IP core report is generated in the same folder as ip core with the name testbench_ip_core_report.html
.
IP core name
and IP core folder
are required in a subsequent step in 'Set Target Reference Design' task of the IP core generation workflow of the DUT. The IP core report also has the address map of the registers that are needed for handshaking with input and output of deep learning processor IP core.
The registers InputValid
, InputAddr
, and InputSize
contain the values of the corresponding handshaking signals that are required to write the preprocessed frame into DDR memory. The register inputNext
is used by the DUT to pulse the inputNext signal after the data is written into memory. These register addresses are setup in the helperSLYOLOv2Setup.m
script. The other registers listed in the report are read/written using MATLAB. For more details on interface signals, see the Design Processing Mode Interface Signals section of Interface with the Deep Learning Processor IP Core (Deep Learning HDL Toolbox).
Generate and Deploy Bitstream to FPGA OR Target the Algorithm
Use the simulation model from this example Integrate YOLO v2 Vehicle Detector System on SoC for simulation as it uses a reduced input image size and the simulation will be faster. Start the targeting workflow by right clicking the YOLOv2 Preprocessing subsystem in the vzYOLOv2DetectorOnLiveCamera model and selecting HDL Code > HDL Workflow Advisor.
open_system('vzYOLOv2DetectorOnLiveCamera');
In step 1.1, select IP Core Generation workflow and the platform 'ZCU102 with FMC-HDMI-CAM'.
In step 1.2, the reference design is set to "RGB with DL Processor". The DL Processor IP name and the DL Processor IP location specify the name and location of the generated deep learning processor IP core, and are obtained from the IP core report.
In step 1.3, map the target platform interfaces to the input and output ports of the DUT. The Pixel Streaming Data signals R,G,B of the algorithm will be mapped to the R,G and B signals of the Target Platform Interface. Similarly, the Pixel Control bus will be mapped to the Pixel Control bus signal in the Target Platform Interface. AXI4-Lite Interface: The DUTProcstart register is mapped to the AXI4-Lite register. When this register is written, it triggers the process of input handshaking logic. Choosing the AXI4-Lite interface directs HDL Coder to generate a memory-mapped register in the FPGA fabric. You can access this register from software running on the ARM processor. AXI4 Master DDR interface: The AXIWriteCtrlInDDR, AXIReadCtrlInDDR, AXIReadDataDDR, AXIWriteCtrlOutDDR, AXIWriteDataDDR and AXIReadCtrlOutDDR ports of DUT are mapped to AXI4 Master DDR interface. The Read Channel of the AXI4 Master DDR interface is mapped to the AXI4 Master DDR Read interface, and the Write Channel of the AXI4 Master DDR interface is mapped to the AXI4 Master DDR Write interface. This interface is used for the data transfer between the Preprocess DUT and the PL DDR. Using the Write Channel of this interface, the preprocessed data is written to the PL DDR which can then be accessed by the Deep Learning Processor IP.
AXI4 Master DL interface: The AXIReadDataDL, AXIReadCtrlInDL, AXIWriteCtrlInDL, AXIReadCtrlOutDL, AXIWriteDataDL and AXIWriteCtrlOutDL ports of DUT are mapped to AXI4 Master DL interface. The Read Channel of the AXI4 Master DL interface is mapped to the AXI4 Master DL Read interface, and the Write Channel of the AXI4 Master DL interface is mapped to the AXI4 Master DL Write interface. This interface is used for the communication between Preprocess DUT and the Deep Learning Processor IP. In this example, this interface is used for implementing input handshaking logic with Deep Learning Processor.
Step 2 prepares the design for generation by doing some design checks.
Step 3 generates HDL code for the IP core.
Step 4.1 integrates the newly generated IP core into the larger Vision Zynq reference design.
In Step 4.2, the workflow generates a targeted hardware interface model and, if the Embedded Coder Zynq support package has been installed, a Zynq software interface model. Since this example uses the shipping example model, uncheck Generate Simulink software interface model and Generate host interface script.
Click Run this task button with these settings. The rest of the workflow generates a bitstream for the FPGA, downloads it to the target, and reboots the board.
Because this process can take 3-4 hours, you can choose to bypass this step by using a pre-generated bitstream for this example that ships with product and was placed on the SDCard during setup.
Note: This bitstream was generated with the HDMI pixel clock constrained to 148.5 MHz for a maximum resolution of 1080p HDTV at 60 frames-per-second.
To use this pre-generated bitstream execute the following commands to copy the device tree file to the current working directory and to load the bitstream on hardware.
copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot, "toolbox","shared","supportpackages","visionzynq","bin", "target","sdcard","visionzynq-zcu102-hdmicam","visionzynq-customtgt", "visionzynq-zcu102-hdmicam-dl.dtb"),"visionzynq-zcu102-hdmicam-dl.dtb");
vz = visionzynq();
changeFPGAImage(vz,'visionzynq-zcu102-hdmicam-dl-yolov2.bit', 'visionzynq-zcu102-hdmicam-dl.dtb')
To configure the Zynq device with this bitstream file at a later stage, execute the following commands:
To copy the dtb file to the current working directory, use this command
copyfile(fullfile(matlabshared.supportpkg.getSupportPackageRoot, "toolbox","shared","supportpackages","visionzynq","bin", "target","sdcard","visionzynq-zcu102-hdmicam","visionzynq-refdes", "visionzynq-zcu102-hdmicam-dl.dtb"),"visionzynq-zcu102-hdmicam-dl.dtb");
vz = visionzynq();
downloadImage(vz,'FPGAImage', '<PROJECT_FOLDER>\vivado_ip_prj\vivado_prj.runs\impl_1\design_1_wrapper.bit',' DTBImage', 'visionzynq-zcu102-hdmicam-dl.dtb')
Compile and Deploy YOLO v2 Deep Learning Network
Now that the bitstream is loaded in the above step, you can deploy the end to end DL Application on the FPGA. Update the bitstream build information in the MAT file generated during the IP core generation. The name of the MAT file is dlprocessor.mat and is located in cwd\dlhdl_prj\, where cwd is your current working folder. Copy the file to the present working folder. This MAT file is generated using the target platfom Generic Deep Learning Processor does not contain the Board/Vendor information. Use updateBitstreamBuildInfo.m function to update the Board/Vendor information and generate a new MAT file with the same name as generated bitstream.
bitstreamName = 'design_1_wrapper';
updateBitstreamBuildInfo('dlprocessor.mat', [bitstreamName,'.mat']);
Create a target object to connect your target device to the host computer.
hTarget = dlhdl.Target('Xilinx', 'Interface', 'Ethernet', 'IpAddr', '192.168.4.2');
Create a deep learning HDL workflow object using the dlhdl.Workflow class. Before running this command, make sure that the generated bit file is available in the current working directory with the name 'visionzynq-zcu102-hdmicam-dl-yolov2.bit'
hW = dlhdl.Workflow('Network', net, 'Bitstream', 'visionzynq-zcu102-hdmicam-dl-yolov2.bit', 'Target', hTarget);
Compile the network, net using the dlhdl.Workflow object.
frameBufferCount = 2;
compile(hW, 'InputFrameNumberLimit', frameBufferCount);
Run the deploy function of the dlhdl.Workflow object to download the network weights and biases on the Zynq UltraScale+ MPSoC ZCU102 board.
deploy(hW, 'ProgramBitStream', false);
Clear the DLHDL workflow object and hardware target.
clear hW;
clear hTarget;
Software interface model
You can run this model in External mode on the ARM processor, or you can use this model to fully deploy a software design. (This model can be deployed only if Embedded Coder and the Embedded Coder Support Package for Xilinx Zynq Platform are installed.)
open_system('vzYOLOv2PostProcess');
Before running this model, you must perform additional setup steps to configure the Xilinx cross-compiling tools. For more information, see Setup for ARM Targeting. In the Postprocessing model, the YOLOv2 Postprocessing subsystem is same as the Integrate YOLO v2 Vehicle Detector System on SoC The postprocessing model configures the DL processor for streaming mode up to a specified number of frames. The output data written to the PL DDR by DL Processor is read using AXI4 Stream IIO Read block.
Once the bounding boxes and scores are calculated in the YOLOv2PostprocessDUT block, the valid signal will be high. This valid signal goes to both draw Rect and set ROI blocks and are used for the synchronization between the input Frame written to the DDR and the bounding boxes and scores calculated. AXI4-Lite registers transfer the control signals between the FPGA and the ARM.
In this example, the software interface model contains only the postprocessing logic, and does not include a Video Capture HDMI block. This model is intended to run on the board independently from Simulink and does not return any data from the board. To view the output video in Simulink, you can use a different model that contains a Video Capture HDMI block and runs while your deep learning design is deployed and running on the board.
Open the 'vzYOLOv2PostProcess' model and click on 'Build, Deploy and Start' This mode runs the algorithm on the ARM processor on the Zynq board.
After opening the vzGettingStarted model, In Video Capture HDMI block, change the 'Video source' to 'HDMI input', 'Frame size' to '1080p HDTV (1920x1080p)', 'Pixel Format' to 'RGB' and 'Capture Point' to 'Output from FPGA user logic (B)' and in To Video Display block, change the 'Input Color Format' to 'RGB' and run the model. The bounding boxes and scores that are calculated in the ARM are overlaid on the corresponding frame and are displayed by the 'To Video Display' block in the vzGettingStarted model.
To stop the executable on ARM, run this command:
vz.stopExecutable('/tmp/vzYOLOv2PostProcess.elf');
Related Examples
More About
- Deep Learning Processing of Live Video (Vision HDL Toolbox Support Package for Xilinx Zynq-Based Hardware)