Main Content

Generate SIMD Code from Simulink Blocks for ARM Platforms

You can generate single instruction, multiple data (SIMD) code from certain Simulink® blocks by using ARM® Neon technology. SIMD is a computing paradigm in which a single instruction processes multiple data. Many modern processors have SIMD instructions that, for example, perform several additions or multiplications at once. For computationally intensive operations on supported blocks, SIMD intrinsics can significantly improve the performance of the generated code on ARM Cortex®-A platforms.

To generate SIMD code for Intel® platforms, see Generate SIMD Code from Simulink Blocks for Intel Platforms.

Blocks That Support SIMD Code Generation for ARM

When certain conditions are met, you can generate SIMD code by using ARM Neon technology. This table lists blocks that support SIMD code generation. The table also details the conditions under which the support is available.

BlockConditions
AddThe input signal is of data type single, int8, int16, int32, int64, uint8, uint16, or uint32.
SubtractThe input signal is of data type single, int8, int16, int32, int64, uint8, uint16, or uint32.
Sum of Elements
  • The input signal is of data type single, int8, int16, int32, int64, uint8, uint16, or uint32.

  • The Optimize reductions configuration parameter is set to on.

ProductThe input signal is of data type single, int8, int16, int32, uint8, uint16, or uint32.
Product of Elements
  • The input signal is of data type single, int8, int16, int32, uint8, uint16, or uint32.

  • The Optimize reductions configuration parameter is set to on.

Gain
  • The input signal is of data type single, int8, int16, int32, uint8, uint16, or uint32.

  • Set Multiplication parameter to Element-wise(.*)

AbsThe input signal is of data type single.
MinMaxThe input signal is of data type single, int8, int16, int32, uint8, uint16, or uint32.
MinMax of Elements
  • The input signal is of data type single, int8, int16, int32, uint8, uint16, or uint32.

  • The Optimize reductions configuration parameter is set to on.

MATLAB FunctionMATLAB code meets the conditions specified in this topic: Generate SIMD Code from MATLAB Functions for ARM Platforms.
For Each Subsystem
  • The For Each Subsystem block contains a block listed in this table that meets the specified conditions.

  • The value of the Partition Dimension block parameter must be above the value of the Loop unrolling threshold configuration parameter.

Bitwise Operator
  • The value of the Operator block parameter must be AND, OR, or XOR.

  • The input signal is of data type int8, int16, int32, int64, uint8, uint16, or uint32.

Shift ArithmeticThe input signal is of data type int8, int16, int32, or int64.

If you have DSP System Toolbox™, you can also generate SIMD code from certain DSP System Toolbox blocks. For more information, see Simulink Blocks in DSP System Toolbox that Support SIMD Code Generation (DSP System Toolbox).

Generate SIMD Code for ARM Compared to Plain C Code

For this example, create a simple model simdDemo that has a Subtract block. The Subtract block has an input signal that has a dimension of 240 and an input data type of single.

Simulink model containing subtract block.

The plain generated C code for this model is:

void simdDemo_step(void)
{
  int32_T i;
  for (i = 0; i < 240; i++) {
    simdDemo_Y.Out1[i] = simdDemo_U.In1[i] - simdDemo_U.In2[i];
  }
}
In the plain (non-SIMD) C code, each loop iteration produces one result.

To generate SIMD code:

  1. Open the Embedded Coder® app.

  2. Click Settings > Hardware Implementation.

  3. Set the Device vendor parameter to ARM Compatible.

  4. Set the Device type parameter to ARM Cortex-A (32-bit) or ARM Cortex-A (64-bit).

  5. On the Optimization pane, for the Leverage target hardware instruction set extensions parameter, select Neon v7. The Neon v7 instruction set supports target hardware ARMv7 and above, including ARMv8 and ARMv9.

  6. Optionally, select the Optimize reductions parameter to generate SIMD code for reduction operations or the FMA to generate SIMD code for fused multiply add operations.

  7. Generate code from the model.

for (i = 0; i <= 236; i += 4) {
    vst1q_f32(&simdDemo_Y.Out1[i], vsubq_f32(vld1q_f32(&simdDemo_U.In1[i]),
               vld1q_f32(&simdDemo_U.In2[i])));
  }

The SIMD instructions are the intrinsic functions that start with the identifier v. The functions process multiple data in a single iteration of the loop because the loop increments by four for single data types. For models that process more data and are computationally more intensive than this one, the presence of SIMD instructions can significantly speed up the code execution time.

See Also