lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

5 visualizzazioni (ultimi 30 giorni)

SA-W il 2 Giu 2023

1
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank

Modificato: Matt J il 7 Giu 2023

I use lsqnonlin to solve a non-linear data-fitting problem (fitting parameters of a partial differential equation). The minimization problem is

f = ||g^sim(params) - g^exp ||^2

Currently, I optimize 18 parameters and the vectors g^sim and g^exp have 1470 entries, which is a collection of vectors with 147 entries at 10 different times.

The exact parameters (re-identification) are given by

Here is my code:

opts = optimoptions('lsqnonlin', ...
                            'StepTolerance', 1e-9, ...
                            'FunctionTolerance', 1e-9, ...
                            'OptimalityTolerance', 1e-9, ...
                            'MaxIterations', 250,...
                            'SpecifyObjectiveGradient', true, ...
                            'CheckGradients', false);
sol = lsqnonlin(@(params)objFun(params, g_exp), E0, lb, ub, opts);
function [f,J] = objFun(params, g_exp)
    g_sim = ...; %call pde solver
    J = ...; &call pde solver
    
    f = g_sim - g_exp; 
    
    %scaling of f and J is explained later...
    
end

Running lsqnonlin with a given start vector, lsqnonlin returned exitflag=3 after 25 iterations and 26 function calls; The sum of squares is 4.3470e-16 and the firstorderopt 3.2449e-08. Also, the reference parameters from above are perfectly re-identified up to the 6th digit after the decimal point indicating the perfect fit.

However, what I observed is that the Jacobian matrix

J = d g^sim(params) / d params

at all 25 iterations has condition numbers

cond(J) \approx 1e11
rank(J) = 18

This is unconvient since I would like to compute some quality measures like the correlation matrix, which does not really make sense for such high condition numbers (the product J'*J has, consequently, even greater condition numbers).

Also, the optimization fails for some other start vectors which indicates that there might be a problem with my Jacobian or the parameters.

Based on my knowledge, such high condition numbers can be traced back to a bad scaling. Currently, I scale like

w=1/abs(max(g_exp(g_exp~=0)));
W=w*ones(length(g_exp),1);
%scale residual and jacobian
f=W(:).*r(:);
J = J.*W(:);

I also tried different scaling approaches which are very common in my field. So I think the issue is not related to scaling the optimization problem.

As you can see, rank(J)=18 at all iterations, which (if I am not mitstaken) indicates that the parameters are not linearly dependent on each other.

Having all that said, what might be reasons why I have so high condition numbers and what could I try to reduce them? Is my data vector g_exp maybe not appropriate?

I am also wondering why the optimization works very well under these circumstances.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Risposta accettata

Matt J il 2 Giu 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank#answer_1249044

Modificato: Matt J il 2 Giu 2023

Apri in MATLAB Online

Your scaling isn't really doing anything meaningful, since all residuals are weighted equally. Really, you are just multiplying your Jacobian by a scalar, which cannot change its cond() number, e.g.,

J=rand(3,2);
cond(J)
ans = 8.0366
cond(rand*J)
ans = 8.0366

Also, another scaling you need to consider, maybe even more importantly than the scaling of the residual, are the units of your parameters. This would introduce weights on the columns of your Jacobian, not just the rows.

Another possibility is simply that your problem is over-parametrized, creating a continuum of non-unique solutions. That's not always a big deal, but may account for why the optimization seems to "fail" (in your words) at different initial points. You haven't described what you're interpreting as a failure, but I'm assuming it means you get unexpected results for the parameters.

As you can see, rank(J)=18 at all iterations, which (if I am not mitstaken) indicates that the parameters are not linearly dependent on each other.

Right, but that's often not helpful. You are using the rank() command's default tolerance setting, which may or may not be appropriate for your problem. It's very easy to construct matrices which pass rank()'s default criteria, but shouldn't be considered full rank, e.g.,

A=diag([1e-14,1]); 
cond(A)
ans = 1.0000e+14
rank(A)
ans = 2
rank(A,1e-10)
ans = 1

39 Commenti
Mostra 37 commenti meno recentiNascondi 37 commenti meno recenti

SA-W il 2 Giu 2023

params.png

What you want in a qualitative sense is for g^sim to be 'comparably sensitive' to changes in each of the parameters. You don't want x(i)+1 to cause a change of 1 while x(j)+1 causes a change of 1e10, for example.

I think exactly this is the problem in my application. Let me try to explain it in more detail:

My parameters are values of a function at given support points. In the attachment, you see a plot of this function where the x-axis has nine support points and the parameters are the y-values at these points. Linear interpolation is applied between the points. This function is evaluated in my finite element program (pde solver) which returns g^sim and J.

As I said, the data vectors g^sim, g^exp are a collection of 10 different times which are vertically appended in a vector:

g^sim = [g^sim_t1; ...;,g^sim_t10]

g^exp = [g^exp_t1,...,g^exp_t10]

Similarly, the Jacobians at the different times are vertically appended:

J = [J_t1; ...; J_t10];

The problem I have is the following: At time t1, the above function is evaluated mainly in the interval [1.0, 1.2]. This means that only the three parameters defined on the points 1.0, 1.1, 1.2 are actually used (activated) when calculating g^sim_t1 and J_t1. (Ideally, J_t1 would be zero at the six columns associated with the non-activated parameters). With increasing time, this interval becomes broader; At time t10, the function is evaluated in the entire interval [0.8,1.6]. This means that all parameters are activated when calculating g^sim_t10 and J_t10. In other words, g^sim_t1 is nearly only sensititve to changes in three of the nine parameters, while g^sim_t10 is sensitive to changes in all parameters. This is due to the physics behind my problem and there is no way for me to circumvent this inequal activation of parameters at different times. But I also observed that I need to collect the vectors at different times to make sure that the re-identification is successful for some start values. Only using g_sim_t10, for instance, proved t0 be less beneficial.

Anyway, I think this suggests a scaling of the Jacobian. IMHO, the "less activated" parameters should be weighted higher, right? Do you have any idea/proposal how such a scaling could look like?

What I could do, for instance, is to count how often the function is evaluated in each interval at every time. Something like: at t1, the function is evaluted 56 times in [1.0 1.2], 87 times in [1.2 1.4], and so forth... But I was not able to establish a scaling out of this information.

But, based on what you told so for, this would most likely improve the condition number of the Jacobian.

SA-W il 5 Giu 2023

Apri in MATLAB Online

params.png

I see what you mean. The scaling boils down to a variable transformation and the objective function must be evaluated at the new variable z. Let me try to explain why this is difficult to realize in my case

[~,J0]=gsim(x0); 
cond(J0) =
   1.95e11
w=1./vecnorm(J0,1,1) = 
   0.0004  0.0008   0.0006   0.0017   0.0050    0.0042    0.0065    0.0375 
cond(J0.*w) = 
   4.33e+10

Here, x0 is the reference solution that I want to re-identify. As I said, the (here 9) parameters are the interpolation values of a 1d function at given support points (see the attachment). The vector w clearly illustrates that the last parameter (w(9)=0.0375) is less activated in the calculation of gsim(x0) than, for instance, the first parameter (w(1)=0.0004). This is inherent to the physics that I am working on and probably explains why I see large condition numbers.

Based on my understanding, what the scaling z=x.*w does is to increase the values of the less activated parameters such that greater values are assembled in the associated Jacobian columns. However, to make sure that gsim(x) can be evaluated, the function must be convex and the parameters in the same order of magnitude. The vector w above would imply that x(9) is at least two orders of magnitude bigger than x(1),x(2),... . Also, the above w also violates the convexity constraint.

Based on this requirement, do you have any remedy/idea as to how the vector w (z=x.*w) could be constructed? Ideally, I would like to introduce weights on some columns of the Jacobian, but without having to do a variable transformation. But this is not possible, right?

SA-W il 5 Giu 2023

Modificato: SA-W il 5 Giu 2023

Apri in MATLAB Online

It looks moot, since the normalization isn't lowering the condition number much at all. Although, you might want to try vecnorm(J0,1,2) or else cond(J0.w,1) so that the norm of the condition number matches the norm used to weight J0.

[~,J0]=gsim(x0); 
cond(J0, 2) = %1-norm does not work for rectangular matrix
   3.75e+10
w1=1./vecnorm(J0,2,1)  %column-wise
cond(J0.*w1, 2) = 
   1.63e+10
w2=1./vecnorm(J0,2,2) %row-wise
cond(J0.*w2, 2) = 
   3.27e+11
   
%your program
[~,idx] = licols(J0); %second column of J0 is removed
cond(J0(:,idx)) = 
   1.58e+10
[~,idx] = licols(J0, 1e-9); %second and last column of J0 are removed
cond(J0(:,idx)) = 
   37.0

The row-wise scaling reduces the condition number even less than column-wise. This makes sense to me since the main problem here is insufficient activation of parameters, which is more visible to column-wise scaling (correct me if I am wrong).

I applied your FEX tool licols, which gives a reasonable condition number if the second and last column of J0 are removed. However, I can not do the optimization without x(2) and x(9) as the values of the convex function at the corresponding two support points are undefined. Would such a strategy make sense at all in your opinion?

" No, it shouldn't. You are initializing at x0./w in the scaled probelm, which means gsim is still being evaluated at the same initial x0 as before. "

Yes, but the problem occurs at intermediate iterations then. Most likely, my pde-solver can not recover from a point z, where z(9) is two or more magnitudes higher than z(1), z(2).

SA-W il 6 Giu 2023

Apri in MATLAB Online

I would also mention that the scaling by w is not something you've explored exhaustively. Your bounds and linear constraints were never adjusted for w, so that approach was never implemented properly. If you have an optimization problem f(x) s.t. A*x<=b and you make scaling or any other linear transform x=D*z then the reformulated problem has to transform the constraints as well, so that it becomes f(D*z) s.t. (A*D)*z<=b. I believe you negelected to transform A, lb, and ub, appropriately.

Thats true. I should definitely go deeper into this but I have doubts that it works. A simply shift z = x .+ 1 causes my pde solver to return NaN because of a different order of magnitude. I think that, whatever transformation I do, this causes a similar situation.

You could add rows to J, or in other words lengthen g^sim if there are additional residual equations you can come up with.

Based on

w=1./vecnorm(J0,1,1) =

0.0004 0.0008 0.0006 0.0017 0.0050 0.0042 0.0065 0.0375 0.1435

I would expand gsim(x) by

gsim*(x) = [gsim(x); (w(9) - w(1))^2]

but I guess the derivative

d[w(9) - w(1)]/dx

can not be obtained easily, right?

SA-W il 7 Giu 2023

What I wanted to say is "to end up with a SET OF linear systems" to successively solve the pde with a Newton-Raphson scheme.

Anyway, do you have a reference where the procedure of implementing a pde as nonlinear constraint is described? Maybe your own work in case you have done something similar already.

This approach seems to be not so common in the realm of my literature.

Matt J il 7 Giu 2023

Modificato: Matt J il 7 Giu 2023

No, I don't have a reference, but I think it should be obvious. A PDE, like any other equation, is of the form ceq(x)=0 which is the form that the Optimization Toolbox solvers require for nonlinear equality constraints.

Accedi per commentare.

Più risposte (1)

John D'Errico il 2 Giu 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank#answer_1249169

Modificato: John D'Errico il 2 Giu 2023

Apri in MATLAB Online

@SA-W - A full rank does NOT mean the parameters are not linearly dependent. With that high of a condition number, it often does mean they are VERY nearly dependent, just not exactly so. However, poor choices of units can often cause high condition numbers, and Matt has attempted to tell you exactly that. Listen to what Matt is telling you.

Consider these two matrices:

format long g
small = 1e-11;
A = [1 1+small;1 1]
A = 2×2
                         1             1.00000000001
                         1                         1
B = [1 small/4;1 -small/4]
B = 2×2
                         1                   2.5e-12
                         1                  -2.5e-12
cond(A)
ans = 
          400003843933.334
cond(B)
ans = 
              400000000000

Both matrices have almost identically the same (and very large) condition number. However the A matrix cannot be simply repaired, because the columns are virtually linearly dependent. The B matrix has a problem essentially because of a poor choice of units. The two columns of B are very different. and in fact, are orthogonal to each other. However, the linear algebra will have difficulties with both cases, because the condition number of B is as large as that of A.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Categorie

Mathematics and Optimization Partial Differential Equation Toolbox General PDEs

Scopri di più su General PDEs in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

39 Commenti
Mostra 37 commenti meno recentiNascondi 37 commenti meno recenti

Più risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

39 Commenti Mostra 37 commenti meno recentiNascondi 37 commenti meno recenti

Più risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

39 Commenti
Mostra 37 commenti meno recentiNascondi 37 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti