# Optimization using lsqnonlin on very distinct data sets that depend on the same variables

1 view (last 30 days)
Niels on 20 Oct 2016
Commented: Niels on 24 Oct 2016
I am working with some large data sets ( N rows of data with 1 parameter varied, each consisting of M points) for which it is assumed that there exists a function that is able to accurately describe each of these rows of data. This function consists of P fit parameters and the one that I vary.
Now, M is a very large number and I cannot afford to use my fitting routine on all N rows of data. Fortunately, my fitting function can be integrated, such that I can instead consider the much smaller single-row data set consisting of just N points.
Getting a nice fit through the integrated quantities goes fast and gives me physically realistic values for my P fit parameters. However, when I then plug in the fit parameters in my original function to compare it to one of the N rows of M points, the result can be way off...
So what I now want to do is make a routine where I consider e.g. 2 out of my N rows, as well as the integrated data for my fitting routine. I tried to simply concatenate everything, but the values and numbers of points may differ significantly and in the end I get similar results as when I consider just a single row of M points at the cost of a slower routine.
How can I realize this combined fitting routine and make everything equally important, independent of the big difference in N and M?

Show 1 older comment
Niels on 21 Oct 2016
Hi Matt,
My model is basically of the form y( t, k ) = f( t, k, p_0,...,p_n ).
As t is a very large array, it is computationally very expensive to apply lsqcurvefit to all k:
k = 1;
p_out = lsqcurvefit( @(p,t) f(p,t,k), p_init, t, y, p_min, p_max, fit_opts);
The above may already take up to 5 minutes to get a decent fit. Solving for all k at once is something my computer does not like at all.
So instead I am using something like
p_out = lsqcurvefit( @F, p_init, k, Y, p_min, p_max, fit_opts);
where Y( k ) = F( k, p_0,...,p_n ), which is simply y (numerically) integrated over t and F is the integral of f over t from t_0 to infinity.
Now I want to get to some intermediate form where I fit Y( k ) vs F( k ), then plug my p_out into f for e.g. k=1 and k=20 and compare these results to y. I can do both of these separately, but I am stuck at getting my p values to converge to something that gives equally good results in both Y vs F and y(t,k=1) vs f(t,k=1) and y(t,k=20) vs f(t,k=20) due to length(k) << length(t).
Matt J on 21 Oct 2016
Yes, but this is basically a restatement of your original question. What does f(t,k,p0,..pn) look like and how large is length(t) and length(k)? Without knowing that, we have no way of making informed recommendations.
Niels on 24 Oct 2016
My apologies. Hopefully the following will give you a better idea:
length(t) can be anything between 5e4 and 5e6, length(k) varies roughly from 10 to 50 and the y(t) I fit against can be quite noisy, but the integrated quantities do not suffer from that and reproduce very well if I do not vary my k.
In f(t,k,p_0,...,p_n), I basically take a summation of N individual contributions in a form similar to below snippet:
function y = f(p,t,k)
% ... Some input checking to make sure all the inputs
% are fed with the correct dimensions...
% size(t) = [M,1];
% size(p) = [1,2*N+1];
% k is a scalar
%%Split up the input parameters
N = (length(p)-1)/2;
N1= 1:N;
a = p( 0*N + N1 );
b = p( 1*N + N1 );
c = p( end );
%%Vectorized calculation of the output using simple matrix multiplications.
y = exp( -( (t-k) * (1./(a+N1*c)) ).^2 ) * b.';
In reality, f is more complicated with a significantly larger number of input parameters, but it is simple enough to be able to expand and perform summation with just some permutations and an occasional bsxfun. F(k,p_0,...,p_n) is simply the definite integral of f w.r.t. t from g(k) to infinity.