Loop Rolling

One of the optimization features of the Target Language Compiler is the intrinsic support for loop rolling. Based on a specified threshold, code generation for looping operations can be unrolled or left as a loop (rolled).

Coupled with loop rolling is the concept of noncontiguous signals. Consider the following model:

The input to the timestwo S-function comes from two arrays located at two different memory locations, one for the output of source1 and one for the output of block source2. This is because of an optimization that makes the Mux block virtual, meaning that code is not explicitly generated for the Mux block and thus processor cycles are not spent evaluating it (i.e., it becomes a pure graphical convenience for the block diagram). So this is represented in the model.rtw file in this case as

Block {
      Type		      "S-Function"
      MaskType		      "S-function: timestwo"
      BlockIdx		      [0, 0, 2]
      SL_BlockIdx	      2
      GrSrc		      [0, 1]
      ExprCommentInfo {
	SysIdxList		[]
	BlkIdxList		[]
	PortIdxList		[]
      }
      ExprCommentSrcIdx {
	SysIdx			-1
	BlkIdx			-1
	PortIdx			-1
      }
      Name		      "<Root>/timestwo  C-MEX S-Function"
      SLName		      "<Root>/timestwo \nC-MEX S-Function"
      Identifier	      timestwoCMEXSFunction
      TID		      0
      RollRegions	      [0:19, 20:49]
      NumDataInputPorts	      1
      DataInputPort {
	SignalSrc		[b0@20, b1@30]
	SignalOffset		[0:19, 0:29]
	Width			50
	RollRegions		[0:19, 20:49]
      }
      NumDataOutputPorts      1
      DataOutputPort {
	SignalSrc		[b2@50]
	SignalOffset		[0:49]
	Width			50
      }
      Connections {
	InputPortContiguous	[no]
	InputPortConnected	[yes]
	OutputPortConnected	[yes]
	OutputPortBeingMerged	[no]
	DirectSrcConn		[no]
	DirectDstConn		[yes]
	DataOutputPort {
	  NumConnPoints		  1
	  ConnPoint {
	    SrcSignal		    [0, 50]
	    DstBlockAndPortEl	    [0, 4, 0, 0]
	  }
	}
  }
.
.
.

From this fragment of the model.rtw file you can see that the block and input port RollRegion entries are not just one number, but two groups of numbers. This denotes two groupings in memory for the input signal. The generated code looks like this:

/* S-Function Block: <Root>/timestwo  C-MEX S-Function */
  /* Multiply input by two */
  {
    int_T i1;

    const real_T *u0 = &contig_sample_B.u[0];
    real_T *y0 = contig_sample_B.timestwoCMEXSFunction_m;

    for (i1=0; i1 < 20; i1++) {
      y0[i1] = u0[i1] * 2.0;
    }

    u0 = &contig_sample_B.u_o[0];
    y0 = &contig_sample_B.timestwoCMEXSFunction_m[20];

    for (i1=0; i1 < 30; i1++) {
      y0[i1] = u0[i1] * 2.0;
    }
}

Notice that two loops are generated and between them the input signal is redirected from the first base address, &contig_sample_B.u[0], to the second base address of the signals, &contig_sample_B.u_o[0]. If you do not want to support this in your S-function or your generated code, you can use

ssSetInputPortRequiredContiguous(S, 1);

in the mdlInitializeSizes function to cause Simulink^® to implicitly generate code that performs a buffering operation. This option uses both extra memory and CPU cycles at run-time, but might be worth it if your algorithm performance increases enough to offset the overhead of the buffering.

Use the %roll directive to generate loops. See also %roll for the reference entry for %roll, and Input Signal Functions for a discussion on the behavior of %roll.