Generate Code for LSTM Network and Deploy on Cortex-M Target

This example uses:

This example demonstrates how to generate floating-point C code for a sequence-to-sequence long short-term memory (LSTM) network. You generate a PIL application that makes predictions at each step of an input timeseries.

This example shows three approaches for handling variable sequence length inputs to the LSTM network in the generated code. For each approach, you generate a PIL application that does one of the following:

Accepts a single observation of variable sequence length
Accepts multiple observations of variable sequence lengths
Leverages the stateful behavior of the LSTM network to accept an input of fixed sequence length

This example uses the accelerometer sensor data from a smartphone carried on the body and makes predictions on the activity of the wearer.

Wearer movements are classified into one of five categories, namely dancing, running, sitting, standing, and walking.

For more information on training the network, see Sequence Classification Using Deep Learning (Deep Learning Toolbox).

When you generate and run the PIL executable, the generated C code runs on an STMicroelectronics® STM32F746G-Discovery board. This board is an ARM Cortex®-M7 based microcontroller.

You can also deploy this example on other STMicroelectronics Discovery boards and STMicroelectronics Nucleo boards that use ARM Cortex-M processors. For deployment on these devices, you must install the corresponding support package and the associated required products, as described in the support package documentation.

For deployment on STMicroelectronics Discovery boards, install the Embedded Coder Support Package for STMicroelectronics Discovery Boards.

Supported STMicroelectronics Discovery boards:

STM32F746G-Discovery
STM32F769I-Discovery
STM32F4-Discovery

For deployment on STMicroelectronics Nucleo boards, install the Simulink Coder Support Package for STMicroelectronics Nucleo Boards.

Supported STMicroelectronics Nucleo boards:

Nucleo-F401RE
Nucleo-F103RB
Nucleo-F302R8
Nucleo-F031K6
Nucleo-L476RG
Nucleo-L053R8
Nucleo-F746ZG
Nucleo-F411RE
Nucleo-F767ZI
Nucleo-H743ZI/Nucleo-H743ZI2

Required Hardware and Peripherals

STM32F746G-Discovery board
USB type A to Mini-B cable

Connect the hardware board to the host computer by using an USB type A to Mini-B cable. To install drivers for the board, see Install Drivers for STMicroelectronics STM32 Boards (Embedded Coder).

Set Code Configuration Parameters

Create Code Configuration Object

Create a coder.EmbeddedCodeConfig object cfg for generating a static library.

cfg = coder.config('lib','ecoder',true);

Configure Object for PIL Execution

To enable PIL-based execution, set VerificationMode to 'PIL'.

cfg.VerificationMode = 'PIL';

To generate generic C code that does not depend on third-party libraries, set TargetLibrary to 'none'.

cfg.DeepLearningConfig = coder.DeepLearningConfig('TargetLibrary', 'none');

Specify Target Hardware

To specify the target hardware, create a coder.Hardware object. Assign this object to the Hardware property of the object cfg.

cfg.Hardware = coder.hardware('STM32F746G-Discovery');

Set PIL Communication Interface

Set up a serial PIL communication interface.

cfg.Hardware.PILInterface = 'Serial';

To determine the COM port for serial communication, follow the steps 2 to 4 in Code Verification and Validation with PIL and Monitoring and Tuning (Embedded Coder). Then, set the PILCOMPort property.

cfg.Hardware.PILCOMPort = 'COM4';

Limit Stack Size

The default stack size is much larger than the memory available on the hardware this example uses. Set the stack size to a smaller value, for example, 512 bytes.

cfg.StackUsageMax = 512;

To view the build log at the command line, enable verbose build.

cfg.Verbose = 1;

Enable ARM Cortex-M CRL

To generate optimal code, use the ARM Cortex-M (CMSIS) code replacement library.

cfg.CodeReplacementLibrary = 'ARM Cortex-M (CMSIS)';

Approach 1: Generate PIL Executable That Accepts a Single Observation of Variable Sequence Length

`lstmNetwork_predict` Entry-Point Function

This entry-point function takes an input sequence and passes it to a trained LSTM network for prediction. Specifically, this function uses the LSTM network trained in the example Sequence to Sequence Classification Using Deep Learning example.

The function loads the network object from the activityRecognitionNet.mat file into a persistent variable. The function reuses this persistent object on subsequent prediction calls.

type('lstmNetwork_predict.m')

function out = lstmNetwork_predict(in) %#codegen

% Copyright 2019-2021 The MathWorks, Inc. 

persistent mynet;

if isempty(mynet)
    mynet = coder.loadDeepLearningNetwork('activityRecognitionNet.mat');
end

% pass in input   
out = predict(mynet,in);

Specify Input Type and Size

Specify the type and size of the input argument to the codegen command by using the coder.typeof function.

For this example, the input is of single data type with a feature dimension value of three and a variable sequence length.

Specifying the sequence length as variable-size enables the generated code to perform prediction on an input sequence of any length.

matrixInput = coder.typeof(single(0),[3 Inf],[false true]);

Generate PIL Executable

Run the codegen command to generate code and the PIL executable.

codegen -config cfg lstmNetwork_predict -args {matrixInput} -report

Run Generated PIL Executable

Load the MAT-file XValidateData.mat. This MAT-file stores the variable XValidateData that contains sample timeseries of sensor readings on which you can test the generated code. Also, load the MAT-file labelsActivity.mat that contains the activity labels.

load XValidateData.mat
load labelsActivity.mat

Call lstmNetwork_predict_pil on the first observation which has a sequence length of six. The same PIL executable can be called using observations of other sequence lengths as well.

YPred1 = lstmNetwork_predict_pil(XValidateData{1});

Clear the PIL executable.

clear lstmNetwork_predict_pil;

YPred1 is a 5-by-6 numeric matrix containing the probabilities of the five classes for each of the 6 time steps.

% For each time step, find the predicted class by calculating the index of the maximum probability value.
[~, maxIndex] = max(YPred1, [], 1);

Associate the index of the maximum probability value to the corresponding label.

Display the associated labels. From the results, you can see that the network predicted the human position for the first observation.

predictedLabels_1stObservation = labels(maxIndex);
disp(predictedLabels_1stObservation)

Approach 2: Generate PIL Executable That Accepts Multiple Observations of Different Sequence Lengths

If you want to perform prediction on many observations at once, you can group the observations together in a cell array and pass the cell array for prediction. The cell array must be a column cell array, and each cell must contain one observation.

Each observation must have the same feature dimension, but their sequence lengths might vary.

Specify Input Type and Size

In this example, XValidateData contains four observations. To generate a MEX that can accept XValidateData as an input, specify the input type to be a 4-by-1 cell array.

Further, specify that each cell be of the same type as matrixInput, the type you specified for the single observation in the previous |codegen| command.

matrixInput = coder.typeof(single(0),[3 Inf],[false true]);
cellInput = coder.typeof({matrixInput}, [4 1]);

Generate PIL Executable

Run the codegen command to generate code and PIL executable.

codegen -config cfg lstmNetwork_predict -args {cellInput} -report

Run the PIL Executable

load XValidateData.mat;
load labelsActivity.mat;

Run the PIL executable for all observations.

YPred2 = lstmNetwork_predict_pil(XValidateData);

Clear the PIL executable.

clear lstmNetwork_predict_pil;

The output is a 4-by-1 cell array of predictions for the four observations passed to lstmNetwork_predict_pil.

disp(YPred2)

Display the associated labels for the first observation.

% For each time step, find the predicted class by calculating the index of the maximum probability.
[~, maxIndex] = max(YPred2{1}, [], 1);
predictedLabels_1stObservation = labels(maxIndex);
disp(predictedLabels_1stObservation)

Approach 3: Generate PIL Executable for Stateful LSTM

`lstmNetwork_predict_and_update` Entry-Point Function

Instead of passing the entire timeseries to predict in one step, you can run prediction on an input by streaming in one timestep at a time by using the predictAndUpdateState (Deep Learning Toolbox) function. This function accepts an input, produces an output prediction, and updates the internal state of the network so that future predictions take this initial input into account. Use this approach in resource constrained hardware that does not have enough memory is not enough to operate on the entire timeseries.

The attached lstmNetwork_predict_and_update function accepts a single-timestep input and processes the input by using the predictAndUpdateState function. This function outputs a prediction for the input timestep and updates the network so that subsequent inputs are treated as subsequent timesteps of the same observation. After passing in all timesteps one at a time, the resulting output is the same as if all timesteps were passed in as a single input.

type('lstmNetwork_predict_and_update.m')

function out = lstmNetwork_predict_and_update(in) %#codegen

% Copyright 2019-2021 The MathWorks, Inc. 

persistent mynet;

if isempty(mynet)
    mynet = coder.loadDeepLearningNetwork('activityRecognitionNet.mat');
end

% pass in input
[mynet, out] = predictAndUpdateState(mynet,in);

Specify Input Type and Size

To run the codegen command on this new design file, you must specify the type and size of the input arguments to the entry-point function. Because each call of lstmNetwork_predict_and_update accepts a single timestep, specify matrixInput to have a fixed sequence length of 1 instead of a variable sequence length.

matrixInput = coder.typeof(single(0),[3 1]);

Generate PIL Executable

Run the codegen command to generate code and PIL executable.

codegen -config cfg lstmNetwork_predict_and_update -args {matrixInput} -report

Run Generated PIL Executable

load XValidateData.mat;
load labelsActivity.mat;

Get the sequence length of the first observation.

sequenceLength = size(XValidateData{1} ,2);

Run the generated PIL executable on the sample's first observation by looping over each time step.

for i = 1:sequenceLength
% get each timestep data
eachTimestepData = XValidateData{1}(:,i);
YPredStateful(:,i) = lstmNetwork_predict_and_update_pil(eachTimestepData);
end

Clear generated PIL executable after each observation.

clear lstmNetwork_predict_and_update_pil;
clear lstmNetwork_predict;

Associate the index of the maximum probability value to the corresponding label.

[~, maxIndex] = max(YPredStateful, [], 1);
predictedLabelsStateful = labels(maxIndex);
disp(predictedLabelsStateful)

Generate Code for LSTM Network and Deploy on Cortex-M Target

Required Hardware and Peripherals

Set Code Configuration Parameters

Create Code Configuration Object

Configure Object for PIL Execution

Specify Target Hardware

Set PIL Communication Interface

Limit Stack Size

Enable ARM Cortex-M CRL

Approach 1: Generate PIL Executable That Accepts a Single Observation of Variable Sequence Length

`lstmNetwork_predict` Entry-Point Function

Specify Input Type and Size

Generate PIL Executable

Run Generated PIL Executable

Approach 2: Generate PIL Executable That Accepts Multiple Observations of Different Sequence Lengths

Specify Input Type and Size

Generate PIL Executable

Run the PIL Executable

Approach 3: Generate PIL Executable for Stateful LSTM

`lstmNetwork_predict_and_update` Entry-Point Function

Specify Input Type and Size

Generate PIL Executable

Run Generated PIL Executable

See Also

Related Topics

Generate Code for LSTM Network and Deploy on Cortex-M Target

Required Hardware and Peripherals

Set Code Configuration Parameters

Create Code Configuration Object

Configure Object for PIL Execution

Specify Target Hardware

Set PIL Communication Interface

Limit Stack Size

Enable ARM Cortex-M CRL

Approach 1: Generate PIL Executable That Accepts a Single Observation of Variable Sequence Length

lstmNetwork_predict Entry-Point Function

Specify Input Type and Size

Generate PIL Executable

Run Generated PIL Executable

Approach 2: Generate PIL Executable That Accepts Multiple Observations of Different Sequence Lengths

Specify Input Type and Size

Generate PIL Executable

Run the PIL Executable

Approach 3: Generate PIL Executable for Stateful LSTM

lstmNetwork_predict_and_update Entry-Point Function

Specify Input Type and Size

Generate PIL Executable

Run Generated PIL Executable

See Also

Related Topics

`lstmNetwork_predict` Entry-Point Function

`lstmNetwork_predict_and_update` Entry-Point Function