Cheat Sheets

Embedded AI with MATLAB and Simulink

From concept to production, deploy AI on any embedded hardware.

Why Embedded AI with MATLAB and Simulink?


Deploy trained AI models to resource-constrained hardware—MCUs, GPUs, FPGAs, and NPUs—with system-level simulation and automated code generation.

  • System-level simulation: Test AI behavior alongside controllers, sensors, and plant models before touching hardware.
  • Code generation: Generate optimized C/C++, CUDA, or HDL directly from your Simulink model, including the AI component, with no manual porting.
  • Import flexibility: Bring in PyTorch, ONNX, or TensorFlow models and deploy them through the same pipeline.
  • Verification throughout. Verify your AI component at every stage with formal methods, adversarial robustness testing, and software-in-the-loop (SIL), processor-in-the-loop (PIL), and hardware-in-the-loop (HIL) testing.
  • Standards compliance: Generate MISRA C–compliant code with traceability to support DO-178C, ISO 26262, and IEC 61508 certification.

End-to-End Embedded AI Workflow


Prepare
Data
Train / Import
AI Model
Compress
Model
Verify
AI Model
Integrate in
Simulink
Deploy &
Verify

Train/Import AI Model


Train in MATLAB programmatically

% Train a deep learning network
net = trainnet(data, layers, "crossentropy", options);

% Train a machine learning model
mdl = fitcsvm(features, labels);

Train in MATLAB interactively

Import from external frameworks

% Import from PyTorch (exported program format)
net = importNetworkFromPyTorch("exported_pytorch_model.pt2")

net =
  dlnetwork with properties:

         Layers: [9×1 nnet.cnn.layer.Layer]
    Connections: [11×2 table]
     Learnables: [86×3 table]
          State: [42×3 table]
     InputNames: {'InputLayer1'}
    OutputNames: {'ResidualNetSmall:fc'}
    Initialized: 1
Source Function
PyTorch (.pt2/.pt) importNetworkFromPyTorch
ONNX importNetworkFromONNX
TensorFlow 2 importNetworkFromTensorFlow
Keras 3 importNetworkFromKeras
XGBoost (.json) importModelFromXGBoost

Compress Model


%% Step 1: Prune (e.g., remove 60% of learnables)
netPruned = compressNetworkUsingTaylorPruning(net, dsTrain, "crossentropy", ...
    options, LearnablesReductionGoal=0.6);

%% Step 2: Project (e.g., retain 80% variance)
npca = neuronPCA(netPruned, dsTrain);
netProjected = compressNetworkUsingProjection(netPruned, npca, ...
    ExplainedVarianceGoal=0.8);
netProjected = trainnet(data, netProjected, "crossentropy", optionsFT);

%% Step 3: Quantize (INT8)
quantObj = dlquantizer(netProjected, ExecutionEnvironment="CPU");
calibrate(quantObj, dsCal);
netQuantized = quantize(quantObj);
Technique Potential Model Size Reduction When to Use
Pruning 50–70% Overparameterized CNNs with redundant filters
Projection 20–85% FC-heavy or recurrent networks with correlated activations
Quantization 75% (4×) Final step for fixed-point processors

Verify AI Model


Prove safety properties or evaluate robustness before deployment using the AI Verification Library and the Deep Learning Toolbox Interface for alpha-beta-CROWN Verifier. Unlike testing on sampled inputs, formal verification provides mathematical guarantees over continuous input regions.

Technique What It Does Key Function
Robustness verification Proves a network’s classification is invariant within a bounded input region verifyNetworkRobustness
Formal output bounds Computes guaranteed upper/lower bounds on network outputs for a bounded input region estimateNetworkOutputBounds
Adversarial robustness Finds adversarial examples that cause misclassification within a bounded input region findAdversarialExamples
Out-of-distribution detection Flags inputs unlike training data to prevent silent failures at runtime networkDistributionDiscriminator

The first argument is either a dlnetwork object (trained in MATLAB or imported) or a model file path — an ONNX file (.onnx) or a full PyTorch model (saved with torch.save()). Same functions, same syntax.

% Prove classification is robust to sensor noise around input X0
XLower = X0 - epsilon;
XUpper = X0 + epsilon;
[result, cex] = verifyNetworkRobustness(net, XLower, XUpper, trueLabel);

% Compute guaranteed output bounds over the input region
[YLower, YUpper] = estimateNetworkOutputBounds(net, XLower, XUpper);

% Find adversarial examples within bounded region
[adversarials, success] = findAdversarialExamples(net, XLower, XUpper, trueLabel);

Integrate in Simulink


Embed AI models in system simulations to verify behavior alongside controllers, sensors, and plant models before generating code.

Block Code Use Case
Co-Execution
Simulate PyTorch, TensorFlow, ONNX, or custom Python models directly in Simulink without conversion;  evaluate how third-party AI performs within larger systems before full integration
Predict
Predict block
Run a dlnetwork as a single inference block (classification or regression)
PyTorch Exported Program
Co-Execution block
Run a PyTorch .pt2 model directly in Simulink with C/C++ and CUDA code generation
Layer Blocks exportNetworkToSimulink Export networks as individual Simulink blocks for per-layer fixed-point control and inspection

Deploy and Verify


Generate standalone source code that runs without MATLAB, then progressively verify on target hardware.

Code generation

Product Output Primary Targets Target Library
MATLAB Coder C/C++ ARM Cortex-A, x86, any POSIX/RTOS Standalone*, Intel oneDNN
Embedded Coder Production C/C++ NXP, Infineon, STMicro, Renesas MCUs, and more Standalone*, CMSIS, CMSIS-NN
GPU Coder CUDA C++ NVIDIA Jetson Thor, Orin, Xavier, TX2 Standalone*, TensorRT
Embedded Coder + HSP Optimized C/C++ for NPU Qualcomm Hexagon, Infineon PPU (AURIX TC4x) Vendor NPU runtime
HDL Coder VHDL/Verilog AMD (Xilinx) FPGAs, Intel FPGAs Deep Learning HDL Toolbox IP

*Set target deep learning library to 'none' to generate standalone ANSI/ISO C/C++ for any processor without dependencies on third-party libraries.

Entry-point function pattern

% Use MATLAB dlnetwork
function out = myPredict(in) %#codegen
    persistent net
    if isempty(net)
        net = coder.loadDeepLearningNetwork('myNet.mat');
   end
    out = predict(net, in);
end
% Use PyTorch model
function out = myPredict(in) %#codegen
    persistent pytorchNet
    if isempty(pytorchNet)
        pytorchNet = loadPyTorchExportedProgram('myPyTorchNet.pt2');
    end
    out = invoke(pytorchNet, in);
end

Configure and generate

% Generate C++ for any processor
cfg = coder.config('lib');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('none');
codegen -config cfg myPredict -args {ones(224,224,3,'single')}

% Generate CUDA for NVIDIA Jetson
gpuCfg = coder.gpuConfig('lib');
gpuCfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
codegen -config gpuCfg myPredict -args {ones(224,224,3,'single')}
Target Library Hardware
'none' Any (library-free)
'mkldnn' x86-64 (Intel oneDNN)
'cudnn' NVIDIA GPUs
'tensorrt' NVIDIA GPUs/Jetson

System-level verification (MIL/SIL/PIL/HIL)

Verify progressively: model (MIL) → generated code on host (SIL) → target processor (PIL) → full system with real I/O (HIL).

Stage What Runs Where What It Verifies
MIL (model-in-the-loop) Simulink model (interpreted) Host PC Algorithm correctness: establishes golden reference
SIL (software-in-the-loop) Generated C/C++/CUDA code Host PC (compiled) Behavioral correctness: numerical equivalence of generated code running on host processor
PIL (processor-in-the-Loop) Generated C/C++/CUDA code Target hardware Target-specific effects: compiler, FPU, numerical equivalence of generated code running on target processor
HIL (hardware-in-the-loop) Full system with real I/O Real-time target Real-time effects: integration, timing, and I/O behavior
% Processor-in-the-Loop verification
set_param("myModel/AI_Subsystem", "SimulationMode", "Processor-in-the-loop");
out = sim("myModel");
% Compare PIL output against MIL baseline to detect numerical drift

Keep Exploring