Advice constructing large sparse Jacobian arrays
9 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Michael Brown
il 1 Ott 2023
Commentato: Sam Marshalik
il 5 Ott 2023
I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces. Constructing the array of partial derivatives of the model predictions with respect to the model parameters (the Jacobian) is the most time consuming and memory limiting step. The arrays are large (>100,000 by >10,000) with the saving grace being that they are sparse (<2% non-zero). I have working code that strains a mac ultra (24 cores and 128 GB unified memory). With care in problem size choices things work as requested. However, I crash MATLAB and the macOS when I am insufficiently careful to not exceed memory limits.
As a practical matter, I watch MATLAB demand greater memory as it sets up the parallel loop and then release it while the loop executes. It is during the loop startup that MATLAB can run out of application memory.- the result is a crash giving: JavaScript Alert - https://127.0.0.1:31515 Message Service fatally disconnected. At this point, I have to force quit MATLAB or in some cases, restart the computer. I find that I can run a bigger problem if I submit it as a batch job rather than running the script in the command line.
At this point I am looking for advise on how to improve my code for greater speed with as little memory as possible required. Attached are code fragments that show what I have at this point with two versions - an ignorant parfor loop that lets MATLAB discover the non-zero array elements and an intellegent parfor loop that is aware of which data sites are sensitive to which model parameters. The second is slightly faster than the first but the speed enhancement is not as great as I expected.
There are also implicit questions in how MATLAB handles sparse arrays -
1. Do I save memory by type casting array indices as integers for the sparse array manipulations?
2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?
3. I "invented" algorithms below to handle the parsing of array indices as needed for construction of the Jacobian. Perhaps there are cleaner quicker ways to accomplish this?
% Following are code fragments for non-linear optimization of tensor spline
% coefficients. Two methods are shown for the construction of the
% Jacobian - the matrix of partial derivatives of the model at all data
% sites with respect to the spline parameters (the coefficents)
% Actual problems are associated with roughly 500,000 data sites and 15,000
% model parameters. The code is running on a mac studio ultra with 24
% cores and 128 GB of ram. The number of workers has to be set to less than
% 24 in order to work within memory limits. The typical time for running
% an optimization ranges from 10 to 30 minutes. The majority of the time is
% consumed in the loop shown below. Thus, the bottlenck in calculations is
% the time and memory requirements of this loop to determine the Jacobians
% The spline coefficients are in vector "coefs"
% The collocation arrays (the spline basis function values) are in structure "Cntrl"
% The function "model" calculates a number of properties that are returned in a structure as separate
% vectors: y1, y2, ....
%
% In the following we track one prediction of the model, 'velsq'. The actual code has to work with multiple predictions
% The value for the finite difference derivative increment, dG, must be set
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 1 => code that is oblivious to the pattern of sparseness of the
% collocation arrays:
% ndat is number of data sites
% nspl is number of spline coefficients (model parameters)
predct1=model(coefs,Cntrl); %model output using the base set of coefficients
nz=0.02*ndat*nspl; % experience shows that the Jacobian sparse array has less than 2% non-zero elements
% preallocate an array to hold the Jacobian for model predictions:
A=sparse([],[],[],ndat,nspl,nz);
% set up parallel pool constants:
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
coefs_tmp(i)=coefs_tmp(i)+dG;
predct2=model(coefs_tmp,D.Value);
A(:,i)=(predct2.velsq-E.Value.velsq)/dG; % blind calculation of prediction differences
%for all data sites. Let MATLAB decide which differences produce zeros for the sparse array A
end
% on exiting this loop we have the Jacobian in sparse array "A". This
% loop assumes no knowledge of the sparseness of the prediction differences
% and lets MATLAB make the call on what to place in the sparse array A
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Example 2 - Requires two steps - This is faster than the first example
% Below an effort is made to reduce numerical (hopefully memory as well) demands of the loop
% Step 1
% determine indices for data impacted by each parameter (do this once in an optimization run and place in Cntrl structure)
[jg,ig]=find(Cntrl.G); % get indices of all non-zero elements of the collocation array
Indexes=[jg(:) ig(:)]; % put indices in a matrix to facilitate making a list
% The cell structure Aindx{i} contains indices for data sites impacted by parameter "i"
Aindx=cell(nspl,1); % alocate a cell variable to hold all the data site indices associated with each parameter
C = parallel.pool.Constant(Indexes);
% loop through all parameters and gather indices of the data impacted by
% each parameter - make int32 to reduce memory required
parfor i=1:nspl
tmp=int32(C.Value(C.Value(:,2)==i,1));
Aindx{i}=tmp;
end
Cntrl.Aindx=Aindx; % put results in the Cntrl structure to make it available later.
% Step 2
% Construct the Jacobian - New Jacobian calculated with every iteration of the optimization
predct1=model(coefs,Cntrl); %model output using the base set of coefficients evaluated at all data sites
sdat=cell(nspl,1); % preallocate a cell structure to hold non-zero Jacobian values
% Step through and perturb model parameter to determine finite difference
% derivatives of data as a function of model parameters - the Jacobian
C = parallel.pool.Constant(coefs);
D=parallel.pool.Constant(Cntrl);
E=parallel.pool.Constant(predct1);
parfor (i=1:nspl,num_worker)
coefs_tmp=C.Value;
parm_id=D.Value.Aindx{i};
coefs_tmp(i)=coefs(i)+dG;
predict2=model(coefs_tmp,D.Value,parm_id); % calculate model values only for data sites impacted by the ith parameter
tmp=(predict2.velsq-E.Value.velsq(parm_id))/dG; % calculate finite differences only for impacted data sites
sdat{i}=tmp; % the length of each sdat{i} cell is variable and about 2% or less of total number of data
end
% at this point we have all non-zero elements of the Jacobian in a cell
% structure "sdat" with nspl cells, each cell contains the Jacobian values for
% data impacted by that parameter at data sites given in Aindx. These need to be unpacted and loaded
% into a sparse array
% Note that a cell structure rather than an array is used since the number
% of elements in each cell is variable
jtot=[];
itot=[];
stot=[];
parfor i=1:nspl
jtot=[jtot;Cntrl.Aindx{i}];
itot=[itot;i*ones(length(Cntrl.Aindx{i}),1)];
stot=[stot;s{i}];
end
A=sparse(jtot,int32(itot),stot,ndat,nspl);
% End of Example
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The function that calculates model predictions:
function out=model(coefs,Cntrl,id)
% Function to return model predictions given b spline coefficients with data site
% basis functions as described in the structure "Cntrl". Usage:
% out=model(coefs,Cntrl,id)
% where coefs are the spline parameters, Cntrl contains all the collocation
% information, id are indices of data sites influenced by a selected
% coefficient
if nargin==2
id=1:length(Cntrl.G(:,1));
end
%calculate all the necessary spline determined predictions
G=Cntrl.G(id,:)*coefs;
Gp=Cntrl.Gp(id,:)*coefs;
Gpp = Cntrl.Gpp(id,:)*coefs;
G3p = Cntrl.G3p(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
G4t=Cntrl.G4t(id,:)*coefs;
Gtt =Cntrl.Gtt(id,:)*coefs;
Gpt=Cntrl.Gpt(id,:)*coefs;
G4p=Cntrl.G4p(id,:)*coefs;
Vex=Cntrl.Vex(id,:)*coefs;
% predictions are placed in output structure
out.G=G; % Gibbs energy
out.V=Gp; % specific volume is Gp
out.Gtt=Gtt; % specific heat is -Gtt*T
out.velsq=Gp.^2./(Gpt.^2./Gtt-Gpp); % the combination of derivatives that gives the square of sound speeds
out.d2KdP2=G4p.*Gp.*Gpp.^(-2) + G3p./Gpp -2*G3p.^2.*Gp.*Gpp.^(-3); % the combination of derivatives giving the second derivative of the bulk modulus
out.G4t=G4t;% parameter to regularize - make "Gtt" smooth
4 Commenti
Risposta accettata
Walter Roberson
il 1 Ott 2023
Modificato: Walter Roberson
il 1 Ott 2023
1. Do I save memory by type casting array indices as integers for the sparse array manipulations?
No.
A1 = sparse(zeros(20,20));
A2 = sparse(zeros(20,20)); A2(20,20) = 1;
A3 = sparse(zeros(20,20)); A3(uint32(20),uint32(20)) = 2;
A4 = sparse(uint32(20), uint32(20), 3);
whos A1 A2 A3 A4
Observe that they are all exactly the same size (though one might suspect that the completely empty one might be smaller because it has no non-zero entries at all.)
2. How small a floating point number is needed for MATLAB to label it zero for a sparse array?
format long g
idx = -350:-300;
A = (10.^idx).';
S = sparse(A);
[firstnz, ~, val_there] = find(S, 1);
idx(firstnz)
val_there
B = sparse([eps(0), 0, -0])
B
So the boundary is at the smallest representable number. sparse() is testing for exact zeros, treating negative 0 the same as 0.
0 Commenti
Più risposte (1)
Matt J
il 2 Ott 2023
I am undertaking non-linear optimization of tensor b spline coefficients used to describe thermodynamic surfaces.
I suggest looking at Example2D.m in,
which demonstrates efficient ways of doing 2D tensorial spline coefficient estimation.
16 Commenti
Matt J
il 5 Ott 2023
Modificato: Matt J
il 5 Ott 2023
It appears that my makeCmn(A,B,C) is equivalent to your KronProd(A,B,C)
But it should be much faster and more memory-efficient, as demonstrated in my example.
We are back to trying to use the available associative properties of the tensors to formulate a non-linear combination of partial derivative basis functions as a linear problem.
I don't see how we are "back" there. The partial derivatives are small arrays and we've established that they can be efficiently computed using the same kind of Kronecker product manipulations that we've discussed at length.
Once you've computed the partial derivatives, you can manipulate them however you wish without any obvious computational challenges. However, if there's something I'm missing, you can elaborate on it in a new post, since your question in this one has been answered.
Sam Marshalik
il 5 Ott 2023
@Michael Brown: I am not sure if you have attempted to use Threads instead of Processes when running parpool (https://www.mathworks.com/help/parallel-computing/choose-between-thread-based-and-process-based-environments.html). If not, give it a shot, as threaded workers will be able to share memory space and will not need to duplicate variables (unless changes are made to them on the worker).
Vedere anche
Categorie
Scopri di più su Applications in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!