Embedded AI with MATLAB and Simulink

From concept to production, deploy AI on any embedded hardware.

Why Embedded AI with MATLAB and Simulink?

Deploy trained AI models to resource-constrained hardware—MCUs, GPUs, FPGAs, and NPUs—with system-level simulation and automated code generation.

System-level simulation: Test AI behavior alongside controllers, sensors, and plant models before touching hardware.
Code generation: Generate optimized C/C++, CUDA, or HDL directly from your Simulink model, including the AI component, with no manual porting.
Import flexibility: Bring in PyTorch, ONNX, or TensorFlow models and deploy them through the same pipeline.
Verification throughout. Verify your AI component at every stage with formal methods, adversarial robustness testing, and software-in-the-loop (SIL), processor-in-the-loop (PIL), and hardware-in-the-loop (HIL) testing.
Standards compliance: Generate MISRA C–compliant code with traceability to support DO-178C, ISO 26262, and IEC 61508 certification.

End-to-End Embedded AI Workflow

Prepare
Data

→

Train / Import
AI Model

→

Compress
Model

→

Verify
AI Model

→

Integrate in
Simulink

→

Deploy &
Verify

Iterative by nature:

This workflow is not strictly linear. Steps may be repeated, reordered, or skipped entirely depending on your project’s memory constraints, latency requirements, target hardware, and standards compliance needs.

Train/Import AI Model

Train in MATLAB programmatically

% Train a deep learning network
net = trainnet(data, layers, "crossentropy", options);

% Train a machine learning model
mdl = fitcsvm(features, labels);

Train in MATLAB interactively

Import from external frameworks

% Import from PyTorch (exported program format)
net = importNetworkFromPyTorch("exported_pytorch_model.pt2")

net =
  dlnetwork with properties:

         Layers: [9×1 nnet.cnn.layer.Layer]
    Connections: [11×2 table]
     Learnables: [86×3 table]
          State: [42×3 table]
     InputNames: {'InputLayer1'}
    OutputNames: {'ResidualNetSmall:fc'}
    Initialized: 1

Source	Function
PyTorch (.pt2/.pt)	`importNetworkFromPyTorch`
ONNX	`importNetworkFromONNX`
TensorFlow 2	`importNetworkFromTensorFlow`
Keras 3	`importNetworkFromKeras`
XGBoost (.json)	`importModelFromXGBoost`

Compress Model

%% Step 1: Prune (e.g., remove 60% of learnables)
netPruned = compressNetworkUsingTaylorPruning(net, dsTrain, "crossentropy", ...
    options, LearnablesReductionGoal=0.6);

%% Step 2: Project (e.g., retain 80% variance)
npca = neuronPCA(netPruned, dsTrain);
netProjected = compressNetworkUsingProjection(netPruned, npca, ...
    ExplainedVarianceGoal=0.8);
netProjected = trainnet(data, netProjected, "crossentropy", optionsFT);

%% Step 3: Quantize (INT8)
quantObj = dlquantizer(netProjected, ExecutionEnvironment="CPU");
calibrate(quantObj, dsCal);
netQuantized = quantize(quantObj);

Technique	Potential Model Size Reduction	When to Use
Pruning	50–70%	Overparameterized CNNs with redundant filters
Projection	20–85%	FC-heavy or recurrent networks with correlated activations
Quantization	75% (4×)	Final step for fixed-point processors

Tip:

Recommended order: Prune → Project → Quantize (fine-tune after each step). Use estimateNetworkMetrics(net) to measure learnables, activation memory, and MACs before and after each step. See also the Deep Learning Toolbox Model Compression Library.

Verify AI Model

Prove safety properties or evaluate robustness before deployment using the AI Verification Library and the Deep Learning Toolbox Interface for alpha-beta-CROWN Verifier. Unlike testing on sampled inputs, formal verification provides mathematical guarantees over continuous input regions.

Technique	What It Does	Key Function
Robustness verification	Proves a network’s classification is invariant within a bounded input region	`verifyNetworkRobustness`
Formal output bounds	Computes guaranteed upper/lower bounds on network outputs for a bounded input region	`estimateNetworkOutputBounds`
Adversarial robustness	Finds adversarial examples that cause misclassification within a bounded input region	`findAdversarialExamples`
Out-of-distribution detection	Flags inputs unlike training data to prevent silent failures at runtime	`networkDistributionDiscriminator`

The first argument is either a dlnetwork object (trained in MATLAB or imported) or a model file path — an ONNX file (.onnx) or a full PyTorch model (saved with torch.save()). Same functions, same syntax.

% Prove classification is robust to sensor noise around input X0
XLower = X0 - epsilon;
XUpper = X0 + epsilon;
[result, cex] = verifyNetworkRobustness(net, XLower, XUpper, trueLabel);

% Compute guaranteed output bounds over the input region
[YLower, YUpper] = estimateNetworkOutputBounds(net, XLower, XUpper);

% Find adversarial examples within bounded region
[adversarials, success] = findAdversarialExamples(net, XLower, XUpper, trueLabel);

Integrate in Simulink

Embed AI models in system simulations to verify behavior alongside controllers, sensors, and plant models before generating code.

Block	Code	Use Case
Co-Execution		Simulate PyTorch, TensorFlow, ONNX, or custom Python models directly in Simulink without conversion; evaluate how third-party AI performs within larger systems before full integration
Predict		Run a `dlnetwork` as a single inference block (classification or regression)
PyTorch Exported Program		Run a PyTorch `.pt2` model directly in Simulink with C/C++ and CUDA code generation
Layer Blocks	`exportNetworkToSimulink`	Export networks as individual Simulink blocks for per-layer fixed-point control and inspection

Deploy and Verify

Generate standalone source code that runs without MATLAB, then progressively verify on target hardware.

Code generation

Product	Output	Primary Targets	Target Library
MATLAB Coder	C/C++	ARM Cortex-A, x86, any POSIX/RTOS	Standalone^*, Intel oneDNN
Embedded Coder	Production C/C++	NXP, Infineon, STMicro, Renesas MCUs, and more	Standalone^*, CMSIS, CMSIS-NN
GPU Coder	CUDA C++	NVIDIA Jetson Thor, Orin, Xavier, TX2	Standalone^*, TensorRT
Embedded Coder + HSP	Optimized C/C++ for NPU	Qualcomm Hexagon, Infineon PPU (AURIX TC4x)	Vendor NPU runtime
HDL Coder	VHDL/Verilog	AMD (Xilinx) FPGAs, Intel FPGAs	Deep Learning HDL Toolbox IP

*Set target deep learning library to 'none' to generate standalone ANSI/ISO C/C++ for any processor without dependencies on third-party libraries.

Entry-point function pattern

% Use MATLAB dlnetwork
function out = myPredict(in) %#codegen
    persistent net
    if isempty(net)
        net = coder.loadDeepLearningNetwork('myNet.mat');
   end
    out = predict(net, in);
end

% Use PyTorch model
function out = myPredict(in) %#codegen
    persistent pytorchNet
    if isempty(pytorchNet)
        pytorchNet = loadPyTorchExportedProgram('myPyTorchNet.pt2');
    end
    out = invoke(pytorchNet, in);
end

Configure and generate

% Generate C++ for any processor
cfg = coder.config('lib');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('none');
codegen -config cfg myPredict -args {ones(224,224,3,'single')}

% Generate CUDA for NVIDIA Jetson
gpuCfg = coder.gpuConfig('lib');
gpuCfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
codegen -config gpuCfg myPredict -args {ones(224,224,3,'single')}

Target Library	Hardware
`'none'`	Any (library-free)
`'mkldnn'`	x86-64 (Intel oneDNN)
`'cudnn'`	NVIDIA GPUs
`'tensorrt'`	NVIDIA GPUs/Jetson

System-level verification (MIL/SIL/PIL/HIL)

Verify progressively: model (MIL) → generated code on host (SIL) → target processor (PIL) → full system with real I/O (HIL).

Stage	What Runs	Where	What It Verifies
MIL (model-in-the-loop)	Simulink model (interpreted)	Host PC	Algorithm correctness: establishes golden reference
SIL (software-in-the-loop)	Generated C/C++/CUDA code	Host PC (compiled)	Behavioral correctness: numerical equivalence of generated code running on host processor
PIL (processor-in-the-Loop)	Generated C/C++/CUDA code	Target hardware	Target-specific effects: compiler, FPU, numerical equivalence of generated code running on target processor
HIL (hardware-in-the-loop)	Full system with real I/O	Real-time target	Real-time effects: integration, timing, and I/O behavior

% Processor-in-the-Loop verification
set_param("myModel/AI_Subsystem", "SimulationMode", "Processor-in-the-loop");
out = sim("myModel");
% Compare PIL output against MIL baseline to detect numerical drift