Main Content

Share Code with Workers

When you submit a job, the software evaluates the tasks of the job on different machines. Each machine must have access to all the files it needs to evaluate its tasks. The following sections explains the basic mechanisms for sharing code with the workers.

Note

For an example that shows how to share code with workers using batch, see Run Batch Job and Access Files from Workers.

Workers Access Files Directly

If the workers all have access to the same drives on the network, they can access the necessary files that reside on these shared resources. This is the preferred method for sharing data, as it minimizes network traffic.

You must define each worker session's search path so that it looks for files in the right places. You can define the path:

  • By using the job's AdditionalPaths property. This is the preferred method for setting the path, because it is specific to the job.

    AdditionalPaths identifies folders to be added to the top of the command search path of worker sessions for this job. If you also specify AttachedFiles, the AttachedFiles are above AdditionalPaths on the workers' path.

    When you specify AdditionalPaths at the time of creating a job, the settings are combined with those specified in the applicable cluster profile. Setting AdditionalPaths on a job object after it is created does not combine the new setting with the profile settings, but overwrites existing settings for that job.

    AdditionalPaths is empty by default. For a mixed-platform environment, the character vectors can specify both UNIX® and Microsoft® Windows® style paths; those setting that are not appropriate or not found for a particular machine generate warnings and are ignored.

    This example sets the MATLAB® worker path in a mixed-platform environment to use functions in both the central repository /central/funcs and the department archive /dept1/funcs, which each also have a Windows UNC path.

    c = parcluster(); % Use default
    job1 = createJob(c);
    ap = {'/central/funcs','/dept1/funcs', ...
         '\\OurDomain\central\funcs','\\OurDomain\dept1\funcs'};
    job1.AdditionalPaths = ap;
    
  • By putting the path command in any of the appropriate startup files for the worker:

    • matlabroot\toolbox\local\startup.m

    • matlabroot\toolbox\parallel\user\jobStartup.m

    • matlabroot\toolbox\parallel\user\taskStartup.m

    Access to these files can be passed to the worker by the job's AttachedFiles or AdditionalPaths property. Otherwise, the version of each of these files that is used is the one highest on the worker's path.

Access to files among shared resources can depend upon permissions based on the user name. You can set the user name with which the MATLAB Job Scheduler and worker services of MATLAB Parallel Server™ software run by setting the MJSUSER value in the mjs_def file before starting the services. For Microsoft Windows operating systems, there is also MJSPASS for providing the account password for the specified user. For an explanation of service default settings and the mjs_def file, see Modify Script Defaults (MATLAB Parallel Server) in the MATLAB Parallel Server System Administrator's Guide.

Pass Data to and from Worker Sessions

A number of properties on task and job objects are designed for passing code or data from client to scheduler to worker, and back. This information could include MATLAB code necessary for task evaluation, or the input data for processing or output data resulting from task evaluation. The following properties facilitate this communication:

  • InputArguments — This property of each task contains the input data you specified when creating the task. This data gets passed into the function when the worker performs its evaluation.

  • OutputArguments — This property of each task contains the results of the function's evaluation.

  • JobData — This property of the job object contains data that gets sent to every worker that evaluates tasks for that job. This property works efficiently because the data is passed to a worker only once per job, saving time if that worker is evaluating more than one task for the job. (Note: Do not confuse this property with the UserData property on any objects in the MATLAB client. Information in UserData is available only in the client, and is not available to the scheduler or workers.)

  • AttachedFiles — This property of the job object is a cell array in which you manually specify all the folders and files that get sent to the workers. On the worker, the files are installed and the entries specified in the property are added to the search path of the worker session.

    AttachedFiles contains a list of folders and files that the worker need to access for evaluating a job's tasks. The value of the property (empty by default) is defined in the cluster profile or in the client session. You set the value for the property as a cell array of character vectors. Each character vector is an absolute or relative pathname to a folder or file. (Note: If these files or folders change while they are being transferred, or if any of the folders are empty, a failure or error can result. If you specify a pathname that does not exist, an error is generated.)

    The first time a worker evaluates a task for a particular job, the scheduler passes to the worker the files and folders in the AttachedFiles property. On the worker machine, a folder structure is created that is exactly the same as that accessed on the client machine where the property was set. Those entries listed in the property value are added to the top of the command search path in the worker session. (Subfolders of the entries are not added to the path, even though they are included in the folder structure.) To find out where the files are placed on the worker machine, use the function getAttachedFilesFolder in code that runs on the worker.

    When the worker runs subsequent tasks for the same job, it uses the folder structure already set up by the job's AttachedFiles property for the first task it ran for that job.

    When you specify AttachedFiles at the time of creating a job, the settings are combined with those specified in the applicable profile. Setting AttachedFiles on a job object after it is created does not combine the new setting with the profile settings, but overwrites the existing settings for that job.

    The transfer of AttachedFiles occurs for each worker running a task for that particular job on a machine, regardless of how many workers run on that machine. Normally, the attached files are deleted from the worker machine when the job is completed, or when the next job begins.

  • AutoAttachFiles — This property of the job object uses a logical value to specify that you want MATLAB to perform an analysis on the task functions in the job and on manually attached files to determine which code files are necessary for the workers, and to automatically send those files to the workers. You can set this property value in a cluster profile using the Profile Manager, or you can set it programmatically on a job object at the command line.

    c = parcluster();
    j = createJob(c);
    j.AutoAttachFiles = true;

    The supported code file formats for automatic attachment are MATLAB files (.m extension), P-code files (.p), and MEX-files (.mex). Note that AutoAttachFiles does not include data files for your job; use the AttachedFiles property to explicitly transfer these files to the workers.

    Use listAutoAttachedFiles to get a listing of the code files that are automatically attached to a job.

    If the AutoAttachFiles setting is true for the cluster profile used when starting a parallel pool, MATLAB performs an analysis on spmd blocks, parfor-loops, and other attached files to determine what other code files are necessary for execution, then automatically attaches those files to the parallel pool so that the code is available to the workers.

Note

There is a default maximum amount of data that can be sent in a single call for setting properties. This limit applies to the OutputArguments property as well as to data passed into a job as input arguments or AttachedFiles. If the limit is exceeded, you get an error message. For more information about this data transfer size limit, see Attached Files Size Limitations.

Pass MATLAB Code for Startup and Finish

As a session of MATLAB, a worker session executes its startup.m file each time it starts. You can place the startup.m file in any folder on the worker's MATLAB search path, such as toolbox/parallel/user.

These additional files can initialize and clean up a worker session as it begins or completes evaluations of tasks for a job:

  • jobStartup.m automatically executes on a worker when the worker runs its first task of a job.

  • taskStartup.m automatically executes on a worker each time the worker begins evaluation of a task.

  • poolStartup.m automatically executes on a worker each time the worker is included in a newly started parallel pool.

  • taskFinish.m automatically executes on a worker each time the worker completes evaluation of a task.

Empty versions of these files are provided in the folder:

matlabroot/toolbox/parallel/user

You can edit these files to include whatever MATLAB code you want the worker to execute at the indicated times.

Alternatively, you can create your own versions of these files and pass them to the job as part of the AttachedFiles property, or include the path names to their locations in the AdditionalPaths property.

The worker gives precedence to the versions provided in the AttachedFiles property, then to those pointed to in the AdditionalPaths property. If any of these files is not included in these properties, the worker uses the version of the file in the toolbox/parallel/user folder of the worker's MATLAB installation.

See Also

| | |

Related Topics