Main Content

Working with Big Data

This example shows how Simulink® models handle big data as input to and output from a simulation.

Big data refers to data that is too large to load into system memory all at once. Simulink simulations can produce big data as simulation output and consume big data as simulation input. Big data for both input and output is stored in a MAT file on the hard disk. Only small chunks of this data are loaded into system memory at any time during simulation. This approach is known as streaming. Simulink simulations can stream data to and from a MAT file. Streaming solves memory issues because the hard disk capacity of a system is typically much greater than the capacity of the random access memory (RAM).

The software uses logging to file to stream big data as the output of a simulation. Streaming from file then supplies big data as input to a simulation.

This example demonstrates strategies for big data simulations. To reduce the time required to run the example, the example uses a shorter simulation duration and generates less data than most big data simulations.

Open Example Model

Open the IterativeCounter project. The project opens the CounterSystem model at startup.

openProject("IterativeCounter");

To update the line styles, update the model. On the Modeling tab of the Simulink® Toolstrip, click Update Model. You can use the line styles to visually identify buses. Alternatively, in the MATLAB® Command Window, enter this command.

set_param("CounterSystem",SimulationCommand="update")

Model CounterSystem

Set Up Logging to File and Simulate Model

When you call the sim function to simulate the model:

  • To specify the name of the Dataset object to hold the result of signal logging, set the SignalLoggingName configuration parameter to "topOut".

  • To stream output data to a file, enable logging to file by setting the LoggingToFile configuration parameter to "on".

  • To log to a MAT file and specify the name of the resulting file, set the LoggingFileName configuration parameter to "top.mat".

  • Set the StopTime configuration parameter to 5000 seconds. Note that the stop time will be a much larger value for most big data simulations, which results in many more data samples to log.

out = sim("CounterSystem", SignalLoggingName="topOut", ...
    LoggingToFile="on", LoggingFileName="top.mat", StopTime="5000");

Alternatively, to set these configuration parameters before you simulate the model, use the set_param function.

When logging to file is enabled on a model, simulation of that model streams logged signals directly into the file. You can choose to log data to a MAT or MLDATX file. Additionally, if the States or Output configuration parameters are enabled and the Format configuration parameter is set to Dataset, those values are streamed into the same MAT or MLDATX file.

Create DatasetRef Object to Reference Logged Dataset Within MAT File

To reference the resulting Dataset object in the logged MAT file, use a DatasetRef object. By using a DatasetRef object, the referenced MAT file is not loaded into memory. The DatasetRef object is a light wrapper object for referencing a Dataset object that is stored in a file. Alternatively, if you call the load function on this file, the software loads the entire file into memory, which might not be possible if this Dataset object contains big data.

dsr = Simulink.SimulationData.DatasetRef("top.mat","topOut");

Obtain Reference to Logged Signal

You can use {} indexing of DatasetRef objects to reference individual signals within a Dataset object without loading the signals into memory. For example, to reference the second signal, enter this command.

sig2 = dsr{2};

The Values field of sig2 is a SimulationDatastore object, which is a lightweight reference to the data of signal 2, stored on disk.

sig2.Values
ans = 
  SimulationDatastore with properties:

      ReadSize: 100
    NumSamples: 50001
      FileName: '/tmp/Bdoc25a_2864802_2867623/tpad7bd522/simulink-ex81669041/IterativeCounter/top.mat'

    Data Preview:

     Time       Data 
    _______    ______

    0 sec      1    5
    0.1 sec    1    5
    0.2 sec    2    6
    0.3 sec    2    6
    0.4 sec    3    7
    :          :

Obtain More References to Other Logged Signals

You can use some of the logged signals as inputs to the simulation of the referenced model. Create lightweight references for each of the logged signals that connect to the input ports of the Model block. In this example, buses connect to the Model block input ports. The resulting Values fields are structures of SimulationDatastore objects. Each structure reflects the hierarchy of the original bus.

batchdata = dsr{1};
controls = dsr{3};

Create Dataset Object to Use as Simulation Input

Specify the input signals to a simulation using a Dataset object. Each element in this Dataset object provides input data to the Inport block that corresponds to the same index. Create an empty Dataset object named ds. Then, place the references to the logged signals into the Dataset object as elements number one and two.

Use {} indexing on the Dataset object to order elements.

ds = Simulink.SimulationData.Dataset;
ds{1} = batchdata;
ds{2} = controls;

Within each element of the Dataset object, you can mix references to signal data, such as SimulationDatastore objects, with in-memory data, such as timeseries objects.

To change one of the upper saturation limits from 30 to 37, enter this command.

ds{1}.Values(2).Limits.UpperSaturationLimit = timeseries(int32(37),0);

Stream Input Data into Simulation

Open the referenced model CounterAlgorithm as a top model, outside of the context of the model CounterSystem.

open_system("CounterAlgorithm");

When you simulate the model CounterAlgorithm, use the Dataset object named ds as input.

out = sim("CounterAlgorithm",LoadExternalInput="on", ...
    ExternalInput="ds");

The data that is referenced by SimulationDatastore objects is streamed into the simulation without overwhelming the system.

The data for the upper saturation limit is not streamed because that signal is specified as an in-memory timeseries. The change in saturation limit is reflected at around time 6 in the scope. The signal now saturates to a value of 37 instead of 30.

Analyze Logged Data Incrementally in MATLAB

SimulationDatastore objects let you analyze the logged data incrementally in MATLAB. Return to the reference to the second logged signal and assign the datastore to a new variable to simplify access to it.

dst = sig2.Values;

SimulationDatastore objects allow incremental reading of the referenced data. The reading is done in chunks and is controlled by the ReadSize property. The default value for ReadSize is 100 samples. Each sample for a signal is the data logged for a single time step of simulation. Change the ReadSize value to 1000. Each read of the datastore returns a timetable representation of the data.

dst.ReadSize = 1000;
tt = dst.read;

Each read on the datastore advances the read counter. You can reset this counter and start reading from the beginning.

dst.reset;

Use SimulationDatastore objects for incremental access to the logged simulation data for big data analysis. You can iterate over the entire data record and chunks.

while dst.hasdata
    next_chunk = dst.read;
end

Consider Longer Simulations

This example shows how logging to persistent storage streams data from the first simulation into a MAT file. A second simulation then streams the data from that file as input.

A more realistic big data example would have a larger value for the model StopTime configuration parameter, resulting in a larger logged MAT file. The second simulation could also be configured for a longer stop time. However, even with the larger data files for output and input, the memory requirements for the longer simulations remain the same.

See Also

Model Settings

Objects

Topics