Main Content

Working with Big Data

This example shows how Simulink® models handle big data as input to and output from a simulation.

Big data refers to data that is too large to load into system memory all at once. Simulink simulations can produce big data as simulation output and consume big data as simulation input. Big data for both input and output is stored in a MAT-file on the hard disk. Only small chunks of this data are loaded into system memory at any time during simulation. This approach is known as streaming. Simulink simulations can stream data to and from a MAT-file. Streaming solves memory issues because the hard disk capacity of a system is typically much greater than the capacity of the random access memory.

The software uses logging to file to stream big data as the output of a simulation. Streaming from file then supplies big data as input to a simulation.

This example demonstrates strategies for big data simulations. To reduce the time required to run the example, the example uses a shorter simulation duration and generates less data than most big data simulations.

Open Example

Open the example, which uses the model named sldemo_mdlref_bus.

openExample('simulink_features/WorkingWithBigDataExample');

If you close the model, you can reopen it from the example folder.

open_system('sldemo_mdlref_bus');

Set Up Logging to File

To stream output data to a MAT-file, enable logging to file by selecting the Configuration Parameters > Data Import/Export > Log Dataset data to file option. You can also specify the name of the file that will contain the result.

To enable logging to file programmatically, set the model configuration parameter LoggingToFile to on.

When logging to file is enabled on a model, simulation of that model streams logged signals directly into the MAT-file. Additionally, if the States or Output configuration parameters are enabled and the Format configuration parameter is set to Dataset, those values are streamed into the same MAT-file.

Simulate Model

Call the sim command to simulate the model.

To specify the name of the Dataset object to hold the result of signal logging, set the SignalLoggingName configuration parameter to topOut.

To specify the name of the resulting MAT-file, set the LoggingFileName configuration parameter to top.mat. Set the StopTime configuration parameter to 5000 seconds. Note that the stop time will be a much larger value for most big data simulations, which results in many more data samples to log.

sim('sldemo_mdlref_bus','SignalLoggingName','topOut',...
    'LoggingToFile','on','LoggingFileName','top.mat',...
    'StopTime','5000');

Create DatasetRef Object to Reference Logged Dataset Within MAT-File

Use a DatasetRef object to reference the resulting Dataset object in the logged MAT-file. By using a DatasetRef object, the referenced MAT-file is not loaded into memory. The DatasetRef object is a light wrapper object for referencing a Dataset object that is stored in a file. Alternatively, if you call the load function on this file, the software loads the entire file into memory, which might not be possible if this Dataset object contains big data.

dsr = Simulink.SimulationData.DatasetRef('top.mat','topOut');

Obtain Reference to Logged Signal

You can use { } indexing of DatasetRef objects to reference individual signals within a Dataset object without loading the signals into memory. For example, to reference the seconds signal, enter this command.

sig2 = dsr{2};

The Values field of sig2 is a SimulationDatastore object, which is a lightweight reference to the data of signal 2, stored on disk.

sig2.Values
ans = 

  SimulationDatastore with properties:

      ReadSize: 100
    NumSamples: 50001
      FileName: '/tmp/Bdoc24b_2725827_2560263/tp69cbc3c5/simulink_features-ex31145726/top.mat'

    Data Preview:

     Time       Data 
    _______    ______

    0 sec      1    5
    0.1 sec    1    5
    0.2 sec    2    6
    0.3 sec    2    6
    0.4 sec    3    7
    :          :

Obtain More References to Other Logged Signals

You can use some of the logged signals as inputs to the simulation of the referenced model. Create lightweight references for each of the logged signals that connect to the input ports of the Model block. In this example, buses connect to the Model block input ports. The resulting Values fields are structures of SimulationDatastore objects. Each structure reflects the hierarchy of the original bus.

counterbus = dsr{1};
incrementbus = dsr{3};

Create Dataset Object to Use as Simulation Input

Specify the input signals to a simulation using a Dataset object. Each element in this Dataset object provides input data to the Inport block that corresponds to the same index. Create an empty Dataset object named ds. Then, place the references to the logged signals into the Dataset object as elements number one and two.

Use { } indexing on the Dataset object to order elements.

ds = Simulink.SimulationData.Dataset;
ds{1} = counterbus;
ds{2} = incrementbus;

Within each element of the Dataset object, you can mix references to signal data, such as SimulationDatastore objects, with in-memory data, such as timeseries objects. To change one of the upper saturation limits from 30 to 37, enter this command.

ds{1}.Values(2).limits.upper_saturation_limit = timeseries(int32(37),0);

Stream Input Data into Simulation

Open the referenced model sldemo_mdlref_counter_bus as a top model, outside of the context of the model sldemo_mdlref_bus.

open_system('sldemo_mdlref_counter_bus');

Simulate the model sldemo_mdlref_counter_bus. Use the Dataset object named ds as input.

out = sim('sldemo_mdlref_counter_bus', ...
          'LoadExternalInput','on','ExternalInput','ds');

The data that is referenced by SimulationDatastore objects is streamed into the simulation without overwhelming the system.

The data for the upper saturation limit is not streamed because that signal is specified as an in-memory timeseries. The change in saturation limit is reflected at around time 6 in the scope. The signal now saturates to a value of 37 instead of 30.

Analyze Logged Data Incrementally in MATLAB®

SimulationDatastore objects let you analyze the logged data incrementally in MATLAB. Return to the reference to the second logged signal and assign the datastore to a new variable to simplify access to it.

dst = sig2.Values;

SimulationDatastore objects allow incremental reading of the referenced data. The reading is done in chunks and is controlled by the ReadSize property. The default value for ReadSize is 100 samples. Each sample for a signal is the data logged for a single time step of simulation. Change the ReadSize value to 1000. Each read of the datastore returns a timetable representation of the data.

dst.ReadSize = 1000;
tt = dst.read;

Each read on the datastore advances the read counter. You can reset this counter and start reading from the beginning.

dst.reset;

Use SimulationDatastore objects for incremental access to the logged simulation data for big data analysis. You can iterate over the entire data record and chunks.

while dst.hasdata
    next_chunk = dst.read;
end

Consider Longer Simulations

This example shows how logging to persistent storage streams data from the first simulation into a MAT-file. A second simulation then streams the data from that file as input.

A more realistic big data example would have a larger value for the model StopTime configuration parameter, resulting in a larger logged MAT-file. The second simulation could also be configured for a longer stop time. However, even with the larger data files for output and input, the memory requirements for the longer simulations remain the same.

See Also

|

Related Topics