Choose How to Manage Data in Parallel Computing
To perform parallel computations, you need to manage data access and transfer between your MATLAB® client and the parallel workers. Use this page to decide how to transfer data the client and workers. You can manage data such as files, MATLAB variables, and handle-type resources.
Determine your Data Management Approach
The best techniques for managing data depend on your parallel application. Use the following tables to look for your goals and discover appropriate data management functions and their key features. In some cases, more than one type of object or function might meet your requirements. You can choose the type of object or function based on your preferences.
Transfer Data from Client to Workers
Use this table to identify some goals for transferring data from the client to workers and discover recommended workflows.
Use variables in your MATLAB workspace in an interactive parallel pool.
Transfer variables in your MATLAB workspace to workers on a cluster in a batch workflow.
Pass variables as inputs into
Give workers access to large data stored on your desktop.
Access large amounts of data or large files stored in the cloud and process it in an onsite or cloud cluster.
Give workers access to files stored on the client computer.
For workers in a parallel pool:
For workers running batch jobs:
Access custom MATLAB functions or libraries that are stored on the cluster.
Specify paths to the libraries or functions using
Allow workers in a parallel pool to access non-copyable resources such as database connections or file handle
Send a message to a worker in an interactive pool running a function.
Transfer Data Between Workers
Use this table to identify some goals for transferring data between workers and discover recommended workflows.
Offload results from workers, which another worker can process.
Store the data in the
Transfer Data from Workers to Client
Use this table to identify some goals for transferring data from a worker to a client and discover recommended workflows.
Retrieve results from a
Retrieve large results at the client.
Store the data in the
Fetch the results from a parallel job.
Load the workspace variables from a
Transfer Data from Workers to Client During Execution
Use this table to identify some goals for transferring data from a worker during execution and discover recommended workflows.
Inspect results from
Update a plot, progress bar or other user interface with data from a function running in an interactive parallel pool.
For very large computations with
1000s of calls to the
Collect data asynchronously to update a plot,
progress bar or other user interface with data from a
Store the data in the
Store the files in the
Compare Data Management Functions and Objects
Some parallel computing objects and functions that manage data have similar features. This section provides comparisons of the functions and objects that have similar features for managing data.
ValueStore are two objects in
Parallel Computing Toolbox™ you can use transfer data between client and workers. The
DataQueue object passes data from workers to the client in
a first-in, first-out (FIFO) order, while
data that multiple workers as well as the client can access and update. You can
use both objects for asynchronous data transfer to the client. However,
DataQueue is only supported on interactive parallel
The choice between
depends on the data access pattern you require in your parallel application. If
you have many independent tasks that workers can execute in any order, and you
want to pass data to the client in a streaming fashion, then use a
DataQueue object. However, if you want to store and share
values to multiple workers and access or update it at any time, then use
fetchOutputs (parfeval) vs.
fetchOutputs function to retrieve the output
arguments of a
Future object, which the software returns when
you run a
fetchOutputs blocks the client until the
computation is complete, then sends the results of the
computation to the client. In contrast, you can use
to store and retrieve values from any parallel computation and also retrieve
intermediate results as they are produced without blocking the program.
ValueStore object is not held in system
memory, so you can store large results in the
However, be careful when storing large amounts of data to avoid filling up the
disk space on the cluster.
If you only need to retrieve the output of a
parfevalOnAll computation, then
fetchOutputs is the simpler option. However, if you
want to store and access the results of multiple independent parallel
computations, then use
ValueStore. In cases where you have
parfeval computations generating large amounts of
data, using the pool
ValueStore object can help avoid memory
issues on the client. You can temporarily save the results in the
ValueStore and retrieve them when you need them.
fetchOutputs (Jobs) vs.
fetchOutputs (Jobs), and
ValueStore provide different ways of transferring data from
jobs back to the client.
load retrieves the variables related to a job you create
when you use the
batch function to run a script or an
expression. This includes any input arguments you provide and temporary
variables the workers create during the computation.
does not retrieve the variables from
batch jobs that run a
function and you cannot retrieve results while the job is running.
fetchOutputs (Jobs) retrieves the output arguments
contained in the tasks of a finished job you create using the
createCommunicatingJob functions. If the job is still
running when you call the
fetchOutputs (Jobs) function, the
fetchOutputs (Jobs) function returns an error.
When you create a job on a cluster, the software automatically creates a
ValueStore object for the job, and you can use it to store
data generated during job execution. Unlike the
fetchOutputs functions, the
object does not automatically store data. Instead, you must manually add data as
key-value pairs to the
ValueStore object. Workers can store
data in the
ValueStore object that the MATLAB client can retrieve during the job execution. Additionally, the
ValueStore object is not held in system memory, so you can
store large results in the store.
To retrieve the results of a job after the job has finished, use the
function. To access the results or track the progress of a job while it is still
running, or to store potentially high memory results, use the
AutoAttachedFiles are all parallel job properties that
you can use to specify additional files and directories that are required to run
parallel code on workers.
AdditionalPaths is a property you can use to add cluster
file locations to the MATLAB path on all workers running your job. This can be useful if you
have files with large data stored on the cluster storage, functions or libraries
that are required by the workers, but are not on the MATLAB path by default.
AttachedFiles property allows you to specify files or
directories that are required by the workers but are not stored on the cluster
storage. These files are copied to a temporary directory on each worker before
the parallel code runs. The files can be scripts, functions, or data files, and
must be located within the directory structure of the client.
AutoAttachedFiles property to allow files needed
by the workers to be automatically attached to the job. When you submit a job or
task, MATLAB performs dependency analysis on all the task functions, or on the
batch job script or function. Then it automatically adds the files required to
the job or task object so they are transferred to the workers. Essentially, you
only want to set the
AutoAttachedFiles property to
false if you know that you do not need the software to
identify the files for you. For example, if the files your job is going to use
are already present on the cluster, perhaps inside one of the
AdditionalPaths when you have functions and libraries
stored on the cluster that are required on all workers. Use
AttachedFiles when you have small files that are
required to run your code. To let MATLAB automatically determine if a job requires additional files to run,
AutoAttachedFiles property to