Distributed job validation passes but parallel job validation fails for Parallel Computation Toolbox.
Mostra commenti meno recenti
Hi,
I am trying to use matlab parallel computation toolbox on a cluster. When I try to validate my scheduler configuration, the distributed job passes the validation but the parallel job fails with the following error:
Stage: Parallel Job
Status: Failed
Description: The given stage reached the default or user-specified timeout.
Command Line Output:
2346069.pbs001.palmetto.clemson.edu
Additionally I find the following error in the lob file on the cluster:
Node file: /var/spool/torque/aux//2346072.pbs001.palmetto.clemson.edu
Starting SMPD on node0218 node0219 node0275 node0276 ...
ssh node0218 "/opt/matlab-R2010a/bin/mw_smpd" -s -phrase MATLAB -port 26072
Warning: Permanently added 'node0218,10.125.1.218' (RSA) to the list of known hosts.^M
Permission denied, please try again.^M
Permission denied, please try again.^M
Permission denied (publickey,gssapi-with-mic,password).^M
Launching smpd failed for node: node0218
Stopping SMPD on ...
Exiting with code: 0
The settings which I have used for the scheduler are:
set(sched, 'ClusterMatlabRoot', '/opt/matlab-new');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'SubmitFcn',{@pbsNonSharedSimpleSubmitFcn,clusterHost, remoteDataLocation});
set(sched, 'ParallelSubmitFcn',{@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
I have also setup a passwordless ssh connection using a rsa key. Could anyone tell me what is wrong with my configuration?
Thanks in advance.
1 Commento
Sarah Wait Zaranek
il 14 Mar 2011
Did you set up passwordless ssh between all nodes of the cluster?
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Job and Task Creation in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!