Physics-informed NN for parameter identification

Dear all,
I am trying to use the physics-informed neural network (PINN) for an inverse parameter identification for any ODE or PDE.
I am wondering does PINN could extract the identified parameters (coefficients in the PDE). Unfortunately, I do not know how to convert the identified parameters in NN to the real parameters.
Thanks in advance.

Risposte (4)

Ben
Ben il 24 Ago 2022
Hi Dawei,
The PINN in that example is assuming the PDE has fixed coefficients. To follow the method of Raissi et al. you can consider a parameterized class of PDEs, e.g. for the Burger's equation you can consider:
The method is then to simply minimize the loss with respect to both the neural network learnable parameters, and the coefficients .
To adapt the example you can extend the parameters in the Define Deep Learning Model section:
parameters.lambda = dlarray(0);
parameters.mu = dlarray(-6);
Next you will need to modify the modelLoss function to replace the line f = Ut + U.*Ux - (0.01./pi).*Uxx with the following:
lambda = parameters.lambda;
mu = exp(parameters.mu);
f = Ut + lambda.*U.*Ux - mu.*Uxx;
Finally you will have to fix the computation for numLayers in the model function, as adding lambda and mu to parameters invalidated it. I simply did the following:
numLayers = (numel(fieldnames(parameters))-2)/2;
This will make the example similar to the author's code. I didn't get very good results for coefficient identification when I tried this, this is possibly due to differences in the options between fmincon and the author's use of ScipyOptimizerInterface. I'm trying that out currently, but hopefully this much will help you get started.

15 Commenti

Dear Ben,
Much appreciate your great help and effort. I will modify the code according to your suggestion.
Meanwhile, I'd be grateful if you could update the progress when it still can be improved .
Best wishes,
Dawei
Hi Ben,
Why did you divide by 2 in revised 'numLayers'? Since there are 9 layer, just subtracting 2 from numel would result 9 which is the correct number of layers. Diving it further by 2 makes it 4.5 which is invalid.
Hi Muhammad,
The example appears to have been updated to use dlnetwork since my last response.
Previously parameters was a struct of weights and biases for fullyconnect operations on dlarray-s, and since each fullyconnect has a weight and bias the number of fullyconnect operations was equal to the number of fields in the parameters struct divided by 2.
In the updated example a dlnetwork object is used as the net input to modelLoss so there's no need to manually compute numLayers.
In this updated version you can either extend modelLoss to take extra arguments for mu and lambda - including as a parameter struct like parameters = struct(mu=mu,lambda=lambda) - or combine the dlnetwork, mu and lambda in one parameter cell or struct, e.g. parameters = struct(net=net,mu=mu,lambda=lambda).
Hope that helps,
Ben
Dear Ben,
I understand that you changed the example (unfortunately you overwrote the old version which leads to problems if someone had linked to the old version). Also, now this example uses a slow iterative training loop!
There are 2 examples for PINN to solve the Burger Eq.,once with LBFGS and once without
("solve-partial-differential-equations-using-deep-learning" and "solve-partial-differential-equations-with-lbfgs-method-and-deep-learning").
I thought the advantage would be the speed of the LBFGS algorithm compared to the one without, but now this example is more or less just as slow as the other one. Is there a way to avoid this iteration loop or could this example be combined with "Transfer Learning" to accelerate the runtime?
Dear CSCh,
The previous version of the example used fmincon so that it could use the LBFGS optimization algorithm, however since then we have introduced lbfgsupdate which allows you to write the training as a standard custom training loop. My understanding is that both approaches are iterative, the parameters are updated at each iteration toward more optimal values.
My understanding is that LBFGS is a more complex (2nd order) optimization algorithm than ADAM (1st order), and in practice a single iteration of training would typically be slower with LBFGS than ADAM, however LBFGS is able to take more precise steps to find more optimised parameters.
A common approach I have seen is to first train the PINN for some number of iterations with ADAM as in this example then continue to train for further iterations with LBFGS as in this example. The idea is that ADAM can get you to a nearly optimal set of parameters reasonably quickly, then LBFGS can refine these further.
Dear Ben,
what I meant was that the previous version did not need the loop over epochs (numEpochs) which makes this example very slow.
To combine both is very interesting, where did you see this "common approach"?
Thanks you,
cheers,
Chris
Dear Chris,
I believe the previous version had an implicit loop over epochs inside the fmincon call, as it still has to optimize by taking steps. I can look into if there are considerable performance differences between these approaches, the expectation should be that an iteration/epoch of LBFGS with fmincon or with lbfgsupdate should take roughly the same amount of time.
The combined approach is in the original PINNs authors' code: GitHub link - see the train method, they first use self.sess.run(self.train_op_Adam, tf_dict) to train with ADAM, then self.optimizer.minimize is using self.optimizer which is a tf.contrib.opt.ScipyOptimizerInterface configured to use LBFGS.
Hope that helps,
Ben
Dear Ben,
Yes you are right, you have to set up num of iterations also in the Fmincon options. However, if I make a quick comparison between these 3 examples with regards to runtime, trying to keep everthing the same (numEpochs=numIteration=1500) and also the other paraemters (Layers/Neurons) etc.also switching out the monitor plot)) I see, that the "previous version" of LBFGS needs only ~3:46min, your updatet version 11:14min and the Adam example needs 8:42min (wheras ADAM has the best accuracy). So, where is this discrepancy especially between the two LBFGS versions comming from? Maybe you can check it too?
Thank you for the link.
Chris
Hi Chris,
Just wanted to jump in and say that the discrepancies in results and performance between the old example with fmincon and the new one with lbfgsupdate are likely caused by two issues.
1) The fmincon example does all calculations in double precision by default while the new example does the calculations in single precision. In this case, the workaround for doing the calculations in double precision is to use the following code after creating the dlnetwork
% Convert learnables to double
learnables = pinn.Learnables;
learnables = dlupdate( @double, learnables );
pinn.Learnables = learnables;
2) The fmincon example has appropriate stopping conditions, while the new example does not. You can add these by adding the following code to the training loop
% Stop training if stopping conditions are met
if solverState.GradientsNorm < gradientTolerance || solverState.StepNorm < stepTolerance || strcmp(solverState.LineSearchStatus, "failed")
break
end
where gradientTolerance and stepTolerance are tolerances you set yourself. I believe the default for these in fmincon are 1e-6.
I have gone ahead and created an enhancement request for that example so that our documentation team can consider adding the suggested code to the example. In the meantime, I hope the workarounds I have suggested work for you and allow you to get better results with improved performance. If it does not, please let me know!
Cheers,
James
Dear James, thank you to step in.
I 'am wondering because I would assume that using single precision is faster than double?
By the way you mean:
learnables = net.Learnables;
learnables = dlupdate( @double, learnables );
net.Learnables = learnables;
However, the old version is still quicker than using your tips.
I would really appreciate if you could test it by yourself, maybe i did something wrong?
Cheers
Chris
Hi Chris,
The issue with using single precision is that you might not be able to get as accurate of a result as double precision. This is an issue because if suitable break conditions are not set in the training loop, lbfgsupdate will perform lots of unnecessary calculations at the tail end of the optimization during the line search procedure because it can't find a suitable solution at that precision level.
One other suggestion to improve the performance would be to accelerate the custom training loop via dlaccelerate. In particular, you can replace the loss function specification with
accFun = dlaccelerate(@modelLoss);
lossFcn = @(net) dlfeval(accFun,net,X,T,X0,T0,U0);
Another thing to bear in mind is that the new example includes plotting which can add additional overhead. This means that when comparing these two examples, you should comment out any code for plotting in the new example for a fairer comparison.
Finally, it is important to note that L-BFGS is very sensitive to network initialization (even more so than stochastic optimizers), and if the network is initialized with a poor set of parameters, a good set of parameters may never be found (or at least may take a very long time to find). This means that, in order to make a fair comparison, multiple training runs should be performed and the results should be averaged.
I made the above modifications for a very similar example and can confirm that doing so improved the performance of the updated workflow via lbfgsupdate to the point where its performance was comparable and was even able to achieve more accurate results than the old workflow via fmincon.
I hope this information helps!
Cheers,
James
Dear James, thank you for your effort, that helps a lot.
"Finally, it is important to note that L-BFGS is very sensitive to network initialization (even more so than stochastic optimizers), and if the network is initialized with a poor set of parameters, a good set of parameters may never be found (or at least may take a very long time to find). "
I would assume that if one use L-BFGS, Transfer Learning would be very helpful.
Best regards,
Chris
Mitra
Mitra il 25 Ago 2023
Modificato: Mitra il 25 Ago 2023
Has anyone successfully implemented an inverse PINN using MATLAB? I tried, but the additional parameters don't seem to change and the corresponding gradient remains at 0.
Hello Ben,
Could you please help me with this issue?
https://ww2.mathworks.cn/matlabcentral/answers/2067501-physics-informed-nn-for-parameter-identification
Thanks a lot

Accedi per commentare.

Rohit
Rohit il 27 Lug 2023
Modificato: Rohit il 28 Lug 2023
Hello @CSCh @Ben @James Gross, I am stuck in the same position of solving an inverse problem code using PINN. Is it possible to share the latest full code for this? I believe that there are some nomenclature changes in the codes with ADAM and L-BFGS in the help center, which is making a bit confusing to understand. Specifcically, I wanted to know how to add the unknown paramters of the differential equation to the trainable paramters of the NN?
Thanks for your help.
Rohit

4 Commenti

Dear Rohit, I wasn't interested in the unknown parameters, I was just "complaining" why Ben was updating and overwriting the PINN L-BFGS (using dlnetwork) example so that some posts on this link are inconclusive and it also slows down the runtime. There were a few discussions about the last point.
Maybe the others here can help you?
@Rohit I believe it should be possible to modfify the new version of the example to solve the inverse problem with lbfgsupdate by:
  1. Creating a struct of parameters with parameters.net = net and parameters.lambda = dlarray(0) and parameters.mu = dlarray(-6) as above.
  2. Implementing modelLoss to take parameters in place of net, making similar modifications to above to use parameters.lambda and parameters.mu to specify the PDE loss term, and computing the necessary gradients for parameters.lambda and parameters.mu using dlgradient.
  3. Passing parameters in place of net to the lbfgsupdate function in the training loop.
This should be the most natural way to add the extra unknown coefficients and learn them via lbfgsupdate, rather than adding these values to the dlnetwork itself.
The workflow using adamupdate should be similar to before.
Hope that helps.
@Ben Hi Ben, I have some difficulty doing your step 2. When you said "Implementing modelLoss to take parameters in place of net", I know how to modify my modelLoss to include the parameter mu but how should I modify modelLoss so the gradient contains information for both net.Learnables and mu? I can calculate the gradients using dlgradient but the output are in different types, one is a table and the other simply a dlarray.
@Wenyu Li - In the above I use a struct called parameters to contain lambda and mu. You can also use a struct to contain the gradients, even though they are different types. Here's some toy code to demonstrate this - note that this doesn't actually train and solve the inverse problem well:
% Solve heat equation du/dt = c * d2u/dx^2 with neural net and unknown c.
batchSize = 100;
dimension = 2; % X = (t,x)
% Set up neural net.
hiddenSize = 100;
net = [
featureInputLayer(dimension)
fullyConnectedLayer(hiddenSize)
tanhLayer
fullyConnectedLayer(hiddenSize)
tanhLayer
fullyConnectedLayer(1)];
% Create struct of parameters
parameters.net = dlnetwork(net);
parameters.c = dlarray(1);
% There are multiple solutions to the heat equation above since we haven't specified initial or boundary conditions.
% We'll avoid this issue by passing in samples of the "true solution" we're trying to approximate.
% In practice this may be data you have collected from simulations.
% Use the heat kernel p(t,x) as true solution. The heat kernel solves dp/dt = d2p/dx^2
% for t>0.
% Setting q(c,t,x) = p(c*t,x) gives a solution dq/dt = c * d2q/dx^2.
heatKernel = @(X) exp(-(X(2,:).^2)./(4*X(1,:)) )./sqrt(4*pi*X(1,:));
trueC = 0.95;
solutionFcn = @(X) heatKernel([trueC;1] .* X);
% Training loop
avgG = [];
avgSqG = [];
maxIters = 1000;
lossFcn = dlaccelerate(@modelLoss);
% Sample X in (0.001, 1.001) x (0.001, 1.001). The 0.001 is to avoid t=0.
X = dlarray(0.001+rand(dimension,batchSize),"CB");
uactual = solutionFcn(X);
for iter = 1:maxIters
[loss,gradients] = dlfeval(lossFcn,X,parameters,uactual);
fprintf("Iteration : %d, Loss : %.4f \n",iter, extractdata(loss));
[parameters,avgG,avgSqG] = adamupdate(parameters,gradients,avgG,avgSqG,iter);
end
function [loss,gradients] = modelLoss(X,parameters,uactual)
% X = (t,x).
% PDE: du/dt = c * d2u/dx2
u = predict(parameters.net, X);
du = dlgradient(sum(u,2), X, RetainData=true, EnableHigherDerivatives=true);
dudt = du(1,:);
dudx = du(2,:);
d2u = dlgradient(sum(dudx,2),X,RetainData=true);
d2udx2 = d2u(2,:);
pdeLoss = l2loss(dudt, parameters.c*d2udx2);
reconstructionLoss = l2loss(u,uactual);
loss = pdeLoss + reconstructionLoss;
[gradients.net, gradients.c] = dlgradient(pdeLoss,parameters.net.Learnables,parameters.c);
end
Alternatively you can use a cell array: parameters = {net,c} and write:
function [loss,gradients] = modelLoss(X,parameters,uactual)
% X = (t,x).
% PDE: du/dt = c * d2u/dx2
u = predict(parameters{1}, X);
du = dlgradient(sum(u,2), X, RetainData=true, EnableHigherDerivatives=true);
dudt = du(1,:);
dudx = du(2,:);
d2u = dlgradient(sum(dudx,2),X,RetainData=true);
d2udx2 = d2u(2,:);
pdeLoss = l2loss(dudt, parameters{2}*d2udx2);
reconstructionLoss = l2loss(u,uactual);
loss = pdeLoss + reconstructionLoss;
[gradients{1},gradients{2}] = dlgradient(pdeLoss,parameters{1}.Learnables,parameters{2});
end
I find the struct approach more readable.
Finally you can also just have separate gradient outputs. This means writing multiple adamupdate calls but gives a little extra flexibility, for example you can set separate learn rates for the net and c parameters:
% Training loop
avgG = [];
avgSqG = [];
avgGCoeff = [];
avgSqGCoeff = [];
lrCoeff = 1e-1;
maxIters = 1000;
lossFcn = dlaccelerate(@modelLoss);
X = dlarray(0.001+rand(dimension,batchSize),"CB");
uactual = solutionFcn(X);
for iter = 1:maxIters
[loss,netGradient,coeffGradient] = dlfeval(lossFcn,X,parameters,uactual);
fprintf("Iteration : %d, Loss : %.4f \n",iter, extractdata(loss));
[parameters{1},avgG,avgSqG] = adamupdate(parameters{1},netGradient,avgG,avgSqG,iter);
[parameters{2},avgGCoeff,avgSqGCoeff] = adamupdate(parameters{2},coeffGradient,avgGCoeff,avgSqGCoeff,iter,lrCoeff);
end
function [loss,netGradient,coeffGradient] = modelLoss(X,parameters,uactual)
% X = (t,x).
% PDE: du/dt = c * d2u/dx2
u = predict(parameters{1}, X);
du = dlgradient(sum(u,2), X, RetainData=true, EnableHigherDerivatives=true);
dudt = du(1,:);
dudx = du(2,:);
d2u = dlgradient(sum(dudx,2),X,RetainData=true);
d2udx2 = d2u(2,:);
pdeLoss = l2loss(dudt, parameters{2}*d2udx2);
reconstructionLoss = l2loss(u,uactual);
loss = pdeLoss + reconstructionLoss;
[netGradient,coeffGradient] = dlgradient(pdeLoss,parameters{1}.Learnables,parameters{2});
end
This latter approach may be relevant, since if you inspect the gradients in this toy example, the gradients for the network learnables and the coefficient have different orders of magnitude, and may benefit from distinct learning rates.
Hope that helps.

Accedi per commentare.

Hello @CSCh @Ben @James Gross, I am also in need of help. Would it be possible to use this same PINN but where the geometry of the problem is defined by a mesh (simialr to this type fo geometry https://www.mathworks.com/help/pde/ug/pde.pdemodel.geometryfrommesh.html).

2 Commenti

@Joshua Prince yes, you should be able to use a mesh to define the collocation points used to train the model. You may also be able to use the PDE Toolbox features to solve the PDE on the mesh, which you could use to either compare with the PINN approach, or use as extra data for training the PINN (which is mesh-free). Unfortunately I don't have code to hand working through such an example.
Thanks for the help!

Accedi per commentare.

Hello @Ben@James Gross, I want to use the PINN to solve PDEs by using two neural networks (net_1 with input of x,t and output of c_w; net_2 with inputs of x,t and r and output of c_intra) but with one loss function (loss_total = loss_PDE+lossBCIC+loss_OB). I tried to use Adam to update the hyperparameters of net_1 and net_2 and it works but the errors are not small enough and I want to try the L-BFGS method, but I have no idea how to do it in matlab. The following code shows the way how I did it in matlab, but it didn't work. Could you help me to solve this problem.
Thanks for your help!
[loss_total,loss_PDE,loss_BCIC,loss_OB,t_BTC,c_w_BTC_Pred,gradients_1,gradients_2]= dlfeval(@modelLoss,net_1,net_2,t_w,x_w,r_w,t_intra,x_intra,r_intra,...
tBC1_w,tBC2_w,xBC1_w,xBC2_w,cBC1_w,tBC1_intra,tBC2_intra,xBC1_intra,...
xBC2_intra,rBC1_intra,rBC2_intra,tIC_w,xIC_w,cIC_w,tIC_intra,xIC_intra,rIC_intra,cIC_intra);
% Initialize the TrainingProgressMonitor object. Because the timer starts
% when you create the monitor object, make sure that you create the object
% close to the training loop.
monitor = trainingProgressMonitor( ...
Metrics="TrainingLoss", ...
Info="Epoch", ...
XLabel="Epoch");
% Train the network using a custom training loop. Use the full data set at
% each iteration. Update the network learnable parameters and solver state using the lbfgsupdate function. At the end of each iteration, update the training progress monitor.
for i = 1:numEpochs
[net_1, solverState] = lbfgsupdate(net_1,[loss_total,gradients_1],solverState);
[net_2, solverState] = lbfgsupdate(net_2,[loss_total,gradients_2],solverState);
updateInfo(monitor,Epoch=i);
recordMetrics(monitor,i,TrainingLoss=solverState.Loss);
end

1 Commento

@binlong liu - you appear to be using lbfgsupdate incorrectly. The 2nd input should be the loss function as a function_handle, see this part of the doc.
Note that you don't need two lbfgsupdate calls, you can put the two networks together in a cell-array or struct as described here. Actually this is likely to be necessary for lbfgsupdate since it calls the loss function for you, and you likely need both networks to do this, and to get the correct gradients.

Accedi per commentare.

Categorie

Scopri di più su Deep Learning Toolbox in Centro assistenza e File Exchange

Prodotti

Release

R2022a

Richiesto:

il 22 Ago 2022

Commentato:

Zhe
il 8 Gen 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by