Error using gop -> Error detected on worker N -> Error during serialization

2 visualizzazioni (ultimi 30 giorni)
I'm receiving an error running distributed code on a cluster;
Error using gop (line 75)
Error detected on worker 5.
Error during serialization
Error in gplus (line 24)
y = gop(@plus, x, labTarget);
Error in hamiltonian>(spmd body) (line 654)
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
Error in hamiltonian>(spmd) (line 611)
spmd
Error in hamiltonian (line 611)
spmd
Error in relaxation (line 111)
[L0,Q]=hamiltonian(assume(spin_system,'labframe'));
Error in decoherence_naphthalene (line 53)
R=relaxation(spin_system);
}
The code within the spmd block is;
spmd
% Localize the problem at the nodes
partition=codistributor1d.defaultPartition(nterms);
codistrib=codistributor1d(1,partition,[nterms 1]);
local_terms=getLocalPart(codistributed((1:nterms)',codistrib));
% Preallocate the local Hamiltonian
H=mprealloc(spin_system,1);
if build_aniso
Q=cell(5,5);
for m=1:5
for k=1:5
Q{k,m}=mprealloc(spin_system,1);
end
end
end
% Build the local Hamiltonian
for n=local_terms'
% Compute operator from the specification
if descr.S(n)==0
oper=operator(spin_system,descr.opL(n),{descr.L(n)},operator_type);
else
oper=operator(spin_system,[descr.opL(n),descr.opS(n)],{descr.L(n),descr.S(n)},operator_type);
end
% Add to relevant local arrays
H=H+descr.H(n)*oper;
if build_aniso
for m=1:5
for k=1:5
if abs(descr.T(n,k)*descr.phi(n,m))>spin_system.tols.inter_cutoff
Q{k,m}=Q{k,m}+descr.T(n,k)*descr.phi(n,m)*oper;
end
end
end
end
end
% Collect the result
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
end
Matlab seems to have no problems starting and connecting to the pool of workers;
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
Starting...
Starting parallel pool (parpool) using the 'local' profile ... connected to 12 workers.
[decoherence_naphthalene > create ] Spinach root directory determined to be /home/******/kec30/spinach_1.4.2114
Some background: I'm running this through an SGE queuing system (although I get the same problem running interactively). It always returns the error with "worker N" where N < # cores in parpool.
I'm running these simulations on Matlab R2014a (8.3.0.532) 64-bit (glnxa64) using code provided in spinach 1.4.2114. It's a 10 spin system (4^n matrix elements), and the code appears to start on my laptop but quickly maxes out my ram (8 GB, about 4 available for Matlab). On the cluster, I've tried reducing the memory dramatically by going from the full system (10 spin-coherences) to a greatly reduced system (3-spin coherences, with a distance cutoff) (which is probably to few), but I still encounter the same problem. Since this seems to be more of a Matlab error than a spinach error, I thought I'd ask here for advice.
  2 Commenti
Edric Ellis
Edric Ellis il 8 Ago 2014
What is the type of 'H'? Can it be saved to a MAT file successfully (this is required as the communication system uses the same mechanism when transferring data)? Also, I note that Q is a cell array - I would not expect gplus(Q,1) to succeed as that's trying to add together each worker's version of 'Q' (but I think that would give you a different error message).
Kevin Claytor
Kevin Claytor il 8 Ago 2014
Edric,
Thanks for your reply. I had to comment out the problematic line;
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
In order to run whos 'H' and try to save H. Both of these I was only able to do after the spmd block.
Running whos 'H' yields:
Name Size Bytes Class Attributes
H 1x12 2937 Composite
And attempting to save H gives:
Warning: Saving Composites is not supported.]
> In BaseRemote>BaseRemote.saveobj at 44
In hamiltonian at 658
In relaxation at 111
In decoherence_naphthalene at 53]
Regarding your comment about the cell array - Spinach has a few overloads (plus, minus, times) implemented for cell arrays. Here is the one for plus;
% Adds cell arrays element-by-element.
function A=plus(A,B)
if iscell(A)&&iscell(B)&&all(size(A)==size(B))
for n=1:numel(A), A{n}=A{n}+B{n}; end
else
error('cell array sizes must match.');
end
end

Accedi per commentare.

Risposte (0)

Categorie

Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by