Embedded AI with MATLAB and Simulink
From concept to production, deploy AI on any embedded hardware.
Why Embedded AI with MATLAB and Simulink?
Deploy trained AI models to resource-constrained hardware—MCUs, GPUs, FPGAs, and NPUs—with system-level simulation and automated code generation.
- System-level simulation: Test AI behavior alongside controllers, sensors, and plant models before touching hardware.
- Code generation: Generate optimized C/C++, CUDA, or HDL directly from your Simulink model, including the AI component, with no manual porting.
- Import flexibility: Bring in PyTorch, ONNX, or TensorFlow models and deploy them through the same pipeline.
- Verification throughout. Verify your AI component at every stage with formal methods, adversarial robustness testing, and software-in-the-loop (SIL), processor-in-the-loop (PIL), and hardware-in-the-loop (HIL) testing.
- Standards compliance: Generate MISRA C–compliant code with traceability to support DO-178C, ISO 26262, and IEC 61508 certification.
End-to-End Embedded AI Workflow
Data
AI Model
Model
AI Model
Simulink
Verify
Iterative by nature:
This workflow is not strictly linear. Steps may be repeated, reordered, or skipped entirely depending on your project’s memory constraints, latency requirements, target hardware, and standards compliance needs.
Train/Import AI Model
Train in MATLAB programmatically
% Train a deep learning network net = trainnet(data, layers, "crossentropy", options); % Train a machine learning model mdl = fitcsvm(features, labels);
Train in MATLAB interactively
Import from external frameworks
% Import from PyTorch (exported program format) net = importNetworkFromPyTorch("exported_pytorch_model.pt2") net = dlnetwork with properties: Layers: [9×1 nnet.cnn.layer.Layer] Connections: [11×2 table] Learnables: [86×3 table] State: [42×3 table] InputNames: {'InputLayer1'} OutputNames: {'ResidualNetSmall:fc'} Initialized: 1
| Source | Function |
|---|---|
| PyTorch (.pt2/.pt) | importNetworkFromPyTorch |
| ONNX | importNetworkFromONNX |
| TensorFlow 2 | importNetworkFromTensorFlow |
| Keras 3 | importNetworkFromKeras |
| XGBoost (.json) | importModelFromXGBoost |
Compress Model
%% Step 1: Prune (e.g., remove 60% of learnables) netPruned = compressNetworkUsingTaylorPruning(net, dsTrain, "crossentropy", ... options, LearnablesReductionGoal=0.6); %% Step 2: Project (e.g., retain 80% variance) npca = neuronPCA(netPruned, dsTrain); netProjected = compressNetworkUsingProjection(netPruned, npca, ... ExplainedVarianceGoal=0.8); netProjected = trainnet(data, netProjected, "crossentropy", optionsFT); %% Step 3: Quantize (INT8) quantObj = dlquantizer(netProjected, ExecutionEnvironment="CPU"); calibrate(quantObj, dsCal); netQuantized = quantize(quantObj);
| Technique | Potential Model Size Reduction | When to Use |
|---|---|---|
| Pruning | 50–70% | Overparameterized CNNs with redundant filters |
| Projection | 20–85% | FC-heavy or recurrent networks with correlated activations |
| Quantization | 75% (4×) | Final step for fixed-point processors |
Tip:
Recommended order: Prune → Project → Quantize (fine-tune after each step). Use estimateNetworkMetrics(net) to measure learnables, activation memory, and MACs before and after each step. See also the Deep Learning Toolbox Model Compression Library.
Verify AI Model
Prove safety properties or evaluate robustness before deployment using the AI Verification Library and the Deep Learning Toolbox Interface for alpha-beta-CROWN Verifier. Unlike testing on sampled inputs, formal verification provides mathematical guarantees over continuous input regions.
| Technique | What It Does | Key Function |
|---|---|---|
| Robustness verification | Proves a network’s classification is invariant within a bounded input region | verifyNetworkRobustness |
| Formal output bounds | Computes guaranteed upper/lower bounds on network outputs for a bounded input region | estimateNetworkOutputBounds |
| Adversarial robustness | Finds adversarial examples that cause misclassification within a bounded input region | findAdversarialExamples |
| Out-of-distribution detection | Flags inputs unlike training data to prevent silent failures at runtime | networkDistributionDiscriminator |
The first argument is either a dlnetwork object (trained in MATLAB or imported) or a model file path — an ONNX file (.onnx) or a full PyTorch model (saved with torch.save()). Same functions, same syntax.
% Prove classification is robust to sensor noise around input X0 XLower = X0 - epsilon; XUpper = X0 + epsilon; [result, cex] = verifyNetworkRobustness(net, XLower, XUpper, trueLabel); % Compute guaranteed output bounds over the input region [YLower, YUpper] = estimateNetworkOutputBounds(net, XLower, XUpper); % Find adversarial examples within bounded region [adversarials, success] = findAdversarialExamples(net, XLower, XUpper, trueLabel);
Integrate in Simulink
Embed AI models in system simulations to verify behavior alongside controllers, sensors, and plant models before generating code.
| Block | Code | Use Case |
|---|---|---|
| Co-Execution |
|
Simulate PyTorch, TensorFlow, ONNX, or custom Python models directly in Simulink without conversion; evaluate how third-party AI performs within larger systems before full integration |
| Predict |
|
Run a dlnetwork as a single inference block (classification or regression) |
| PyTorch Exported Program |
|
Run a PyTorch .pt2 model directly in Simulink with C/C++ and CUDA code generation |
| Layer Blocks | exportNetworkToSimulink |
Export networks as individual Simulink blocks for per-layer fixed-point control and inspection |
Deploy and Verify
Generate standalone source code that runs without MATLAB, then progressively verify on target hardware.
Code generation
| Product | Output | Primary Targets | Target Library |
|---|---|---|---|
| MATLAB Coder | C/C++ | ARM Cortex-A, x86, any POSIX/RTOS | Standalone*, Intel oneDNN |
| Embedded Coder | Production C/C++ | NXP, Infineon, STMicro, Renesas MCUs, and more | Standalone*, CMSIS, CMSIS-NN |
| GPU Coder | CUDA C++ | NVIDIA Jetson Thor, Orin, Xavier, TX2 | Standalone*, TensorRT |
| Embedded Coder + HSP | Optimized C/C++ for NPU | Qualcomm Hexagon, Infineon PPU (AURIX TC4x) | Vendor NPU runtime |
| HDL Coder | VHDL/Verilog | AMD (Xilinx) FPGAs, Intel FPGAs | Deep Learning HDL Toolbox IP |
*Set target deep learning library to 'none' to generate standalone ANSI/ISO C/C++ for any processor without dependencies on third-party libraries.
Entry-point function pattern
% Use MATLAB dlnetwork
function out = myPredict(in) %#codegen
persistent net
if isempty(net)
net = coder.loadDeepLearningNetwork('myNet.mat');
end
out = predict(net, in);
end
% Use PyTorch model function out = myPredict(in) %#codegen persistent pytorchNet if isempty(pytorchNet) pytorchNet = loadPyTorchExportedProgram('myPyTorchNet.pt2'); end out = invoke(pytorchNet, in); end
Configure and generate
% Generate C++ for any processor cfg = coder.config('lib'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('none'); codegen -config cfg myPredict -args {ones(224,224,3,'single')} % Generate CUDA for NVIDIA Jetson gpuCfg = coder.gpuConfig('lib'); gpuCfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt'); codegen -config gpuCfg myPredict -args {ones(224,224,3,'single')}
| Target Library | Hardware |
|---|---|
'none' |
Any (library-free) |
'mkldnn' |
x86-64 (Intel oneDNN) |
'cudnn' |
NVIDIA GPUs |
'tensorrt' |
NVIDIA GPUs/Jetson |
System-level verification (MIL/SIL/PIL/HIL)
Verify progressively: model (MIL) → generated code on host (SIL) → target processor (PIL) → full system with real I/O (HIL).
| Stage | What Runs | Where | What It Verifies |
|---|---|---|---|
| MIL (model-in-the-loop) | Simulink model (interpreted) | Host PC | Algorithm correctness: establishes golden reference |
| SIL (software-in-the-loop) | Generated C/C++/CUDA code | Host PC (compiled) | Behavioral correctness: numerical equivalence of generated code running on host processor |
| PIL (processor-in-the-Loop) | Generated C/C++/CUDA code | Target hardware | Target-specific effects: compiler, FPU, numerical equivalence of generated code running on target processor |
| HIL (hardware-in-the-loop) | Full system with real I/O | Real-time target | Real-time effects: integration, timing, and I/O behavior |
% Processor-in-the-Loop verification set_param("myModel/AI_Subsystem", "SimulationMode", "Processor-in-the-loop"); out = sim("myModel"); % Compare PIL output against MIL baseline to detect numerical drift