Integrating Deep Learning with GPU Coder into Simulink

This example shows how to integrate the CUDA® code generated for a deep learning network into Simulink®. GPU coder™ does not support code generation for Simulink blocks but you can still use the computational power of GPUs in Simulink by generating a dynamic linked library (dll) with GPU Coder and then integrating it into Simulink as an S-Function block by using the legacy code tool. For more information, see . To illustrate this concept, the example uses Lane Detection Optimized with GPU Coder (GPU Coder). The original example used a C++ file with OpenCV functions to read the frames, draw lanes, and overlay frame rate information on the video output. This example uses Simulink blocks from the Computer Vision System Toolbox™ to perform the same operations.

Prerequisites

  • CUDA enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).

  • GPU Coder™ Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

Verify GPU Environment

Use the coder.checkGpuInstall function to verify that the compilers and libraries necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Workflow

This diagram illustrates the general procedure for using the Legacy Code Tool to integrate the CUDA code generated for a deep learning network into Simulink.

Get Pretrained SeriesNetwork

[laneNet,coeffMeans,coeffStds] = getLaneDetectionNetwork();

The architecture of the pretrained SeriesNetwork is similar to AlexNet except that the last few layers are replaced by a smaller, fully connected layer and regression output layer. This network takes an image input and outputs two lane boundaries that correspond to the left and right lanes of the ego vehicle. Each lane boundary is represented by a parabolic equation, . Here, is the lateral offset and is the longitudinal distance from the vehicle. The network outputs the three parameters , , and that describe the parabolic equation for the left and right lane boundaries.The variables coeffStds and coeffMeans contain the mean and std values from the trained network. These values are required during simulation.

Main Entry Point Function

This example uses the detect_lane.m entry-point function. The detect_lane function computes the and coordinates corresponding to the lane positions from the , , and parameters. The detect_lane function also performs computations that map the and coordinates to image coordinates.

Generate a Dynamic Link Library (DLL) for the Function

To run the detect_lane function on the GPU from Simulink, generate a shared library by using GPU Coder. The inputs to the detect_lane function are the video frame, mean, and std values. The values passed by using the -args option reflect the size of these inputs. Copy the generated library to the top-level folder.

Isize = single(zeros(227,227));

cfg = coder.gpuConfig('dll');
cfg.TargetLang = 'C++';
cfg.GenerateReport = true;
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -args {ones(227,227,3,'single'),ones(1,6,'double'),ones(1,6,'double')} -config cfg detect_lane

if ispc
    copyfile(fullfile(pwd, 'codegen','dll', 'detect_lane','detect_lane.dll'), pwd);
else
    copyfile(fullfile(pwd, 'codegen','dll', 'detect_lane','detect_lane.so'), pwd);
end
Code generation successful: To view the report, open('codegen/dll/detect_lane/html/report.mldatx').

Generate and Compile S-Function

The lane detection example depends on the NVIDIA CUDA run time, cuBLAS, and the cuDNN library. The Legacy Code Tool data structure specifies:

  • A name for the S-function

  • Specifications for the existing C++ function

  • All library and header files required for compilation and the file paths

  • Options for the generated S-function

After defining the structure, use the legacy_code function to:

  • Initialize the Legacy Code Tool data structure for the C++ function

  • Generate an S-function for use during simulation

  • Compile and link the generated S-function into a dynamically loadable executable (MEX)

  • Generate a masked S-function block for calling the generated S-function

srcPath = fullfile(pwd, 'codegen', 'dll', 'detect_lane');

if ispc
    cuPath = getenv('CUDA_PATH');
    cudaLibPath = fullfile(cuPath,'lib','x64');
    cudaIncPath = fullfile(cuPath,'include');

    cudnnPath = getenv('NVIDIA_CUDNN');
    cudnnIncPath = fullfile(cudnnPath,'include');
    cudnnLibPath = fullfile(cudnnPath,'lib','x64');

    libs = {'detect_lane.lib','cudart.lib','cublas.lib','cudnn.lib'};

else
    [~,nvccPath] = system('which nvcc');
    nvccPath = regexp(nvccPath, '[\f\n\r]', 'split');
    cuPath = erase(nvccPath{1},'/bin/nvcc');
    cudaLibPath = fullfile(cuPath,'lib64');
    cudaIncPath = fullfile(cuPath,'include');

    cudnnPath = getenv('NVIDIA_CUDNN');
    cudnnIncPath = fullfile(cudnnPath,'include');
    cudnnLibPath = fullfile(cudnnPath,'lib64');

    [~,cmdout] = system('ldconfig -p | grep "libcublas.so "');
    pathStrIdx =  strfind(cmdout,'/usr/');
    cublasLibPath = fileparts(cmdout(33:end));
    cublasIncPath = '/usr/include';

    libs = {'detect_lane.so','libcudart.so','libcublas.so','libcudnn.so'};
end

headerPath = {srcPath;cudnnIncPath;cudaIncPath;cublasIncPath};
libPath = {srcPath;cudnnLibPath;cudaLibPath;cublasLibPath};

% Define the Legacy Code Tool data structure
def = legacy_code('initialize');
def.SFunctionName = 'lane_detect_sfun';
def.OutputFcnSpec = 'void detect_lane(single u1[154587],double u2[6],double u3[6],uint8 y1[1],single y2[56],single y3[56])';
def.IncPaths = headerPath;
def.HeaderFiles = {'detect_lane.h'};
def.LibPaths = libPath;
def.HostLibFiles = libs;
def.Options.useTlcWithAccel = false;
def.Options.language = 'C++';

legacy_code('sfcn_cmex_generate', def);
legacy_code('compile', def);
### Start Compiling lane_detect_sfun
    mex('lane_detect_sfun.cpp', '-I/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex59437955/codegen/dll/detect_lane', '-I/mathworks/hub/3rdparty/R2019b/4462385/glnxa64/cuDNN/cuda/include', '-I/usr/local/cuda/include', '-I/usr/include', '-I/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex59437955', '/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex59437955/codegen/dll/detect_lane/detect_lane.so', '/usr/local/cuda/lib64/libcudart.so', '/usr/lib/x86_64-linux-gnu/libcublas.so', '/mathworks/hub/3rdparty/R2019b/4462385/glnxa64/cuDNN/cuda/lib64/libcudnn.so')
Building with 'g++'.
MEX completed successfully.
### Finish Compiling lane_detect_sfun
### Exit

The OutputFcnSpec argument specifies the function that the S-function calls at each time step. The detect_lane.h header file in the codegen folder provides the function specification information. Map the detect_lane function arguments to the Simulink S-Function block by using a uniquely numbered u token for input ports and the y token for output ports. The code generation data types defined in tmwtypes.h must also be mapped to the data types that Simulink supports. For more information, see Declaring Legacy Code Tool Function Specifications. Because this example already contains a complete Simulink model, generation of the S-Function block is not performed. To generate the S-Function block, use:

legacy_code('slblock_generate', def);

Create Simulink Model for Lane Detection

Move all the pre- and post-processing operations in the main_lanenet.cpp file of the original example into Simulink. The Input Video Processing subsystem removes normalization performed by the multimedia reader block and resizes the input video frame to the input layer size of the lane detection network, 227-by-227-by-3. The subsystem then converts the three-dimensional video frame into the one-dimensional vector required by the detect_lane library. The Lane Points enabled subsystem processes of the left and right lane points to make them suitable for the Draw Lanes block. The Simulink model uses a video display to show lane detection on a sample video.

open_system('main_lanenet');
set_param('main_lanenet', 'SimulationCommand', 'update');

Run Simulink Model (Lane Detection)

To see lane detection on a sample video, run simulation.

sim('main_lanenet', 'timeout', 30);

Cleanup

Close the Simulink model.

close_system('main_lanenet');